Du Full Ba Economics

 Introductory Microeconomics
 Introduction to Microeconomics
 Nature and Subject Matter of Economics
 Demand, Supply and Market Equilibrium
 The Concept of Demand, Suply & Elasticity
 The Theory Of Consumer Choice
 Short Run Cost and Output Decision Available Soon
 Long Run Cost and Output Decision
 Behavior of Profit Maximizing Firms and The Production Process
 Monopoly and the Antitrust Policies of the Government
 The Market for the Factors of Production
 Introductory Macroeconomics
 Introduction to Macroeconomics
 National Income Accounting
 Money Demand and Supply
 Credit Creation and Monetary Policy
 Inflation and its social costs
 Mathematical Methods for Economic-I

 Preliminaries-I
 Preliminaries-II
 Function, Sequence and Series
 Limit and Continuity
 Single Variable Optimization
 Integration of Functions
 Statistical Methods in Economics-I

 Introduction to Statistics
 Numerical Measures in Descriptive Statistics
 Elementary Probability Theory
 Conditional Probability
 Discrete Random Variable & Probability Distribution
 Continuous Random Variable & Probability Distribution
 Mathematical Expectation: Discrete
 Mathematical Expectation: Continuous
 Theoretical Distibutions : Discrete and Continuous
 Covariance and correlation
 Joint Probability Distribution
 Mathematical Expectation for joint probability distribution
 Mathematical Methods for Economics-II
 Vectors & Vector Operations
 Matrices & Matrix Operations
 Determinations & Matrix Inversions
 Linear Dependency and Rank of Matrix
 Geometric Representation of Fuctions: Graphs and Level Curves
 Higher Order Differentiation and its Applications
 Homogeneous and Homothetic Function
 Convexity and concavity of functions
 Constrained Optimization
 Uncontrained Optimization
 Statistical Methods in Economics-II

 Statistics and Their Distributions
 Some Special Distributions
 Point Estimation
 Point Estimates For Population Mean, Variance And Proportions: Single Sample And Two
Samples
 P-Value Tests For The Population Means
 Tests Concerning Population Proportion And Variance
 Confidence Interval-I (Concept, interpretation and derivation) Available Soon
 Confidence Interval-II (Population mean, variance and proportion) Available Soon
 Hypothesis Testing Available Soon
 Tests for single and two population means Available Soon
 Intermediate Microeconomics-I
 Budget Constraint
 Preferences and Indifference Curves
 Consumer Optimization
 Decomposition of Price Effect Available Soon
 Intertemporal Choices Available Soon
 Revealed Preference Theory Available Soon
 Production - Recap and basic concepts Available Soon
 Costs Meeta Kumar Available Soon
 THE PERFECT COMPETITION Available Soon
 Intermediate Macroeconomics-I
 Aggregate Demand and Aggregate Supply Available Soon
 Short Run Open Economy Models Available Soon
Introduction to Microeconomics
Lesson: Introduction to Microeconomics
Lesson Developer: Dipavali Debroy
College/Department: SGGSCC, University of Delhi
Institute of Lifelong learning, University of Delhi

Table of Contents
1.Learning Outcomes
2.Introduction
3. Evolution of the Subject
4. Methodology of Economics: Positive Economics and Normative Economics
5. Art or Science
6. Scope of Economics - Related Subjects
7. Models and Hypotheses
8. Market and Equilibrium
8.1 Demand and Supply
9. Concept of ceteris paribus – General Equilibrium Partial Equilibrium
10. Static and Dynamic Equilibrium
11. Short-Run and Long Run Equilibrium
12. Nobel Prize in Economics
13. Summary
14. Exercises
15. Glossary
16. References
17. Activity
18. Quiz
1.Learning Outcomes
After you have read this chapter you should be able to define Micro-
Economics. Macro-Economics, Market, Demand, Supply, Equilibrium, Partial
and General Equilibrium, Static and Dynamic Equilibrium understand the
central problems of an economy identify variables, constants and
parameters differentiate Micro-Economics from Macro-Economics appreciate
the scope of the subject of Economics apply the knowledge of basic
Economics
2. Introduction
Micro-Economics is the branch of Economics that studies economic issues minutely

in individual details, as if under a microscope. In contrast, Macro-Economics is the
branch of Economics that studies economic issues in aggregative and overall forms,
looking at the broad picture. . The word Micro comes from the Greek word Micros
(small). Macro comes from Greek macros ( long or huge). Micro-Economics and
Macro-Economics are thus two complements of the subject of Economics
But what is Economics?
Value Addition 1: Focus of the Section

Topic Economics
This section is to make you aware of what Economics is.
The purpose of this section is to make you familiar with the various
Definitions of Economics, the Evolution of the subject, its Scope,
Methodology, Tools and Basic Concepts Concepts.
Text for the section

According to the renowned economist Alfred Marshall, it is the study of human

beings as they go about their everyday life.
To quote from Marshall’s Principles of Economics (1890, "a study of mankind in the
ordinary business of life; it (Economics) examines that part of individual and social
action which is most closely connected with the attainment and with the use of the
material requisites of wellbeing. Thus it is on one side a study of wealth; and on the
other, and more important side, a part of the study of man." For decades this was
the most accepted definition till, in 1935, Lionel Robbins focused on another aspect
of the subject and defined it as the study of choice under conditions of scarcity.
"Economics is a science which studies human behavior as a relationship between
ends and scarce means which have alternative uses."
Productive Resources ( land, labour, capital goods such as machinery, technical
knowledge) are scarce or limited and the resource applied to the production of a
certain commodity or service is unavailable for the production of another alternative
one. But human wants for the Consumption of goods and services ( cereals and
pulses, meat and fish and poultry, vegetable, clothes, woolens, houses, roads, cars,
railways, airplanes, books, theatre , film, television and countless others) are
unlimited, and come from numerous members of the society .Economics is the study
of how people can choose to use the scarce or limited resources to produce various
good and services and distribute them to various members of society for their
consumption.
The Central Problems of an Economy
Any society faces three fundamental and interdependent economic problems:

1. What to Produce and How Much of them
2. How to Produce, that is, by whom and by what resources and technology
3. For Whom to Produce, that is, how is the total amount of production in the
society to be distributed among its members.
Different kinds of society have different ways of solving them. A tribal society hunts
and forages together and shares the fruits of their labour more or less equally. In the
feudal society , the serfs produce and pay certain shares of the production to the
feudal lords as per the traditions established by the lords. An advanced capitalist
society produces through a complicated division of labour and distributes the total
product in a complex and unequal way.
In a socialist state, central planning takes care of production and its equitable
distribution.
But all societies must face the three problems :What to Produce, How and For Whom
; in more technical terms, the problems of Allocation, Choice of Techniques and
Distribution.
The Production Possibility curve or frontier is a geometrical way of depicting this
choice problem. It depicts the production possibilities or “menu” as Paul Samuelson
had put it.
Suppose a society or economy, using all its resources fully, has the option of
producing any combination out of a maximum of , say, food crops (represented by
the symbol X), and a maximum of aero planes (Y).
Represent food crops (X) on the horizontal axis and aero planes (Y) on the vertical.
Each point on the X-Y plane would then represent a numerical combination of food
grains and aero plane.
Suppose we have the following chart of alternative combinations of the maximum
number of aero planes that can be produced along with a certain amount of food
grains, or vice versa. That is, we have a chart of alternative combinations of
maximum Y’s going with different X’s ( or, combinations of maximum X’s going with

different Y’s).Each such combination has its position on the X-y plane. Join them to
get the Production Possibility Curve (PPC).
Each point on the PPC represents a maximum of X ( at a certain Y) or a maximum of
Y ( at a certain X). All points below and including the PPC represents combinations of
X and Y that are Attainable by the society concerned but only points on the PPC
represent points of maximum X(given the Y’s) or maximum Y ( given the X’s). Points
below the PPC ( including the two axes and so, the origin) represent what the society
concerned can produce but without using its (scarce) resources fully.
When, for some reason or the other, the society becomes capable of producing more
of X ( at every given Y) or more of Y (at every given X), the PPC curve shifts
forward. This indicates Economic Growth. When the reverse happens, the PPC
shrinks back.

Economics studies these Central Problems of human society, and has by now come
to study much more.
How did economics evolve as a subject?
3. Evolution of the subject
Etymologically, the word Economics derives from the Greek word oikos ( house) and
nomos ( management).
A wife doing a good job of running the household was often called thrifty or
economical. However from the second half of the seventeenth century, the word
Economics came to be applied to a much wider context, viz., the management of
the various resources of a whole country or nation.
Adam Smith is known as the `father’ of the subject of Economics. His book An
Inquiry into the Nature and Causes of the Wealth of Nations, first published in 1776,
is the first-ever treatise on Economics . Smith’s concern was about nations or
countries, that is, it was a macro-type concern , although the term macro was not in
use then.
Later T.R. Malthus, David Ricardo and even J.S.Mill wrote important treatises on the
subject, taking the same overall perspective. They made sweeping generalizations
taking long-run perspectives. They are known as Classical economists. Thus, by and
large, Classical economists did not take a micro-approach but rather a macro one.
Towards the end of the 19th century, economists began to study economic issues on
a more specific and individual level. A new or neo approach evolved that has come to
be called the Neo-Classical approach. It concentrated on how the price and quantity
of specific goods (and services) were determined in the market though a rational
balancing of their `marginal’ costs and benefits (ùtilities and
productivities).Foremost among these Neo-Classical economists( also described as

`marginalists’) were Menger, Jevons and Alfred Marshall. It is their work that
constitutes the foundation of Micro-economics. Especially Marshall’s Principles of
Economics ( published first in 1890).There evolved the concept of the èconomic
man’, that is, the individual producer or consumer, who made his choices or
decisions perfectly rationally. On the basis of his preferences or cost patterns, the
consumer’s objective was always to maximize his utility or satisfaction and the
producer’s, to maximize his profits. The individual consumer or producer was the unit
concerned, not the entire national entity.
Although the Classical economists had been concerned with the nation or the country
as a whole, and therefore are more macro than micro, Macro-economics as an area
of study began really after the Great Depression of the 1930s. Europe and America
both suffered from it and so did their colonies in other parts of the globe. Widespread
unemployment followed the closing down of production units. Both employers and
employees suffered. It was then that John Maynard Keynes came up with his
analysis of the phenomenon in terms of Aggregate Demand falling short of
Aggregate Supply and emphasized the role of the Government of a country in
stepping up its own expenditure in order to give Aggregate Demand a boost.
His analysis laid the foundation of Macro-Economics. Later John Hicks, Milton
Friedman, Lucas and others have contributed to the subject of Macro-Economics.
Paul. Samuelson has emphasized that there is no essential opposition between
Macro-Economics and Micro-Economics. Both are “vital” to the understanding of the
subject ( Economics, 7th edn, p 362). It is usual in most universities to offer a course
in Micro-Economics prior to that in Macro-Economics. But it is not a necessary
practice and is changing.
In the words of Paul A. Samuelson. “Macroeconomics deals with the big picture –
with the macro aggregates of income, employment, and price levels. But do not
think that microeconomics deals with unimportant details. After all, the big picture is
made up of its parts.” ( Economics, 7th edn, p 362)
4. The Methodology of Economics – Positive Economics and Normative

Economics
Economics can be subjected to another distinction, that between Positive Economics

and Negative Economics. The distinction is that between `what is’ and `what ought
to be’ so far as economic issues are concerned. To `posit’ means to state or even
explain something as an objective principle or fact. A `norm’ on the other hand
means a standard or ideal set up subjectively.
According to economists like Milton Friedman ( who wrote Essays in Positive
Economics, 1953), economists should not pass moral strictures or make `value
judgements’. In other words, Economics should just `posit’ or be Positive. It can
say:’ If the price of a commodity goes up, its quantity consumed falls, other things
remaining the same’.
But it should not go on to pass judgements or give advice such as : ‘If the price of a
commodity goes up, its consumer should reduce the quantity to be consumed.
That does not mean that Economics cannot contain policy prescriptions. It simply
means that such prescriptions or recommendations should be expressed in an
objective way.
E.g.,’ If there is economic depression and the government increases its consumption
expenditure, the depression is likely to get corrected’.
To state it as follows is to be normative:
‘If there is economic depression, the government of the country should increase its
consumption expenditure’.

5. Art or Science
Several decades ago, textbooks in Economics often discussed this issue: Is

Economics a science or an art? Etymologically, a science ( derived from sci, to
know) provides theoretical knowledge while an art ( derived from artem, to do)
teaches us how to practice or do it. Now Economics teaches us all about, say, how a
producer maximized his profits. But it does not teach him how to make profits. From
this point of view, it is a science rather than an art. Again, the recent theoretical
developments in Economics have made so much use of Mathematics, that a sound
knowledge of Mathematics is essential even for its undergraduate Honours course,
e.g., in Delhi University itself. This takes Economics closer to being a Science
subject.
However, the hallmark of science is experiment. A science must provide room for
controlled experiment so as to verify its hypotheses. But human beings cannot be
subjected to experiments just to find out the effects of policies. In this sense,
Economics definitely belongs to the Humanities stream. Most universities, regard
Economics as an art and award BA and MA degrees in it. However The London
School of Economics does, in fact, award BSc and MSc degrees to its students of
Economics.
Indeed the scope of Economics is so wide that it is difficult to categories it as either
science or art. It is perhaps a mixture of both.
As Paul Samuelson put it, “ Not only is Economics at once art and a science,
economics as a subject can combine the attractive features of both the humanities
and the sciences”(Economics, 7th edn, Chapter 1,p 4).
A Social Science
Even if we use the term science to describe Economics, we must remember that it is
a Social Science. It does not study individuals in isolation, doing everything by
oneself. It studies individuals as members of a society or nation or Economy.
An economy is the same as country or society but considered only in its economic
aspects. Every society or country has numerous people engaged in activities of all
sorts. Some work in the fields, some work in factories, and yet others in offices.
Some perform agricultural activities, some industrial, and some do services. Those
who are in agriculture need to get industrial products and, say, banking services.
Those who are factory-workers, say, need to get hold of foodstuff, and use some
kind of transport services. The people engaged in the services sector need both food
and clothing . Thus all the three sectors with their separate kinds of activities need to
have relations. All the people of an economy need to act as well as inter-act. This
they do by exchanging the products of their various activities in various markets.
The epithet `Social’ covers this aspect of the subject of Economics.
However, for analytical purposes, Economics sometimes uses the concept of a
Robinson Crusoe Economy, or an economy consisting of a single person performing
all the economic activities by himself. Robinson Crusoe is the title of a book written
in 1719 by Daniel Defoe based on the life of Alexander Selkirk who was marooned on
an island and survived all by himself for 28 years. A Robinson Crusoe Economy is
thus a theoretical concept where the economy has a singleton member.
Economics has a wide scope and has connections with various subjects.
Mathematics and Statistics are necessary for the study of Economics. Mathematics
helps economists to analyze economic realities, to and derive conclusions from

them. Statistics aids this process by systematizing the economic realities as data and
inferring from them by accepted statistical tools. In fact, the application of Statistics
to Economics had led to the development of a relatively new subject: Econometrics.
It helps in empirical study and making projections both into the past and the future.
Without a sound mathematical base, it is next to impossible to cope with academic
Economics. However, to have an general awareness of the economic occurrences of
the world, basic intelligence will do. To quote Samuelson, “ Although every
introductory textbook must contain geometrical diagrams, knowledge of
mathematics itself is needed only for the higher reaches of economic theory. Logical
reasoning is the key to success in the mastery of basic economic principles, and
shrewd weighing of empirical evidence is the key to success in mastery of economic
applications.”( Economics, Ch 1. p 5)
Actually, the earlier term for Economics was Political Economy. Several universities
still have a common department for Politics and Economics. Political Science is an
useful subject to supplement a course in Economics. History is also a subject that
has a close connection with Economics. Economic History is a compulsory paper in
every course in Economics, undergraduate as well as post-graduate. Several
universities offer a post-graduate course in Economic Geography.
In recent times several subjects or courses have emerged from Economics, e.g.,
Commerce, Business Economics, Business Administration, Business Management.
While based on the fundamentals of Economics, they have their own distinctive
course contents. But both Papers on Micro-Economics and Macro-Economics figure in
all of them.
Economics has to deal with a complex mass of realities. So it sometimes puts them
into a simplified framework or Model. A Model is a theoretical construct that
represents economic realities by a set of inter-related variables. These relationships
can be logical or quantitative. But putting them in a Model helps economists to
analyze realities better and even made future predictions.
Economist often posit or propose explanations for economic phenomena. These are
known as Hypotheses. A hypothesis is not a theory. Only if a Hypothesis is verified
or found to be true, can we call it a Theory. To be verified or falsified, that is tested,
a hypothesis has to be framed in a certain way. Such a hypotheses is called a
Scientific hypothesis. Sometimes economists have no alternative but to take a
certain hypothesis to be true, and proceed on the basis of it. Such a hypothesis is
called a Working hypothesis. Statistics and Econometrics are the tools used in
verifying a hypothesis.
Laws of Economics
The Classical and Neoclassical economists often used the term `law’ to describe the
tendencies that they observed in functioning of the economy or society. The Law of
Demand, the Law of Diminishing Returns, Say’s Law , Okun’s law are just a few
examples. In no sense are these binding or enforceable or universal laws.
However, law in the usual sense of the term does have a close connection with
Economics. For the market to function well, there must be law and order in the
country. This is a basic idea of Neo-Classical Economics. Laws influence economic
occurrences. For example, the Permanent Settlement of 1793 had a far-reaching
influence on India’s agriculture. After Independence, the government had to pass
several Abolition of Intermediaries Acts in order to correct the agricultural situation.
The Monopolistic and Restrictive Trade Practices Act, the Consumer Protection Act ,

the various economic reforms, all testify to the close connection between Law and
Economics.
8. Market and Equilibrium
The word Market comes from Latin mercatus which meant trading, buying or selling
at an appointed time or place. A market is not necessarily a marketplace. It is a
context or background where buying and selling are taking place. The haat, bazaar
and mandi , the shop and the mall are markets. But on line or telephonic sale and
purchase , which is quite common these days, are also market transactions.
The distinguishing feature of the market is that market transactions are exchanges
, usually performed through the medium of money. The seller ( who is sometimes
though not always the producer) of certain commodities/ services brings them to the
market and offers certain quantities of quantities of them at a certain price . He
thus supplies them in the market. The (prospective) buyer comes to the market
wanting to get certain commodities/ services at a certain price. He thus demands
them in the market. If the demand of the buyer and the supply of the seller match at
a certain configuration of price and quantity, the transaction takes place. If not, it
does not.
The transaction is thus both a sale and a purchase. It is sale from the point of view
of the Seller(producer) , that is, from the Supply side. It is purchase from the point
of view of the Buyer, that is, the Demand side.
The transaction has two aspects or dimensions to it, viz., a quantity and a price. For
example, the seller is agreeable to selling 2 kegs of rice at the rate of Rest 50, and
the buyer finds this offer reasonable. “Two kgs of rice at Rs 50” is then the
description of the transaction. The total amount spent by the buyer/ consumer and
received by the seller/supplier is thus Rs 100 (50 x 2), and this is called the
Expenditure from the buyer’s point of view and the Revenue from the seller’s. The
transaction configuration and the total expenditure/revenue are thus distinct
concepts.
The transaction configuration is known as the Equilibrium configuration, or simply,
Equilibrium. It is called so because it represents a matching or balancing of two
aspects – the Buyer’s and the Seller’s, that is, the Demand side and the Supply side.
In Latin, aequus means equal and libra means scales or balances.( That is why in the
Zodiac, the sign Libra is shown by a pair of scales). When the two scales on the two
sides of a scales instrument hang at the same level, there is aequilibrium, or, in
English, Equilibrium. Neither of the scales go up or down any more, and unless there
is some external disturbance, the balance, or equilibrium, holds.
8.1 Demand and Supply
The word Demand is from Latin demandare which means to claim or commission.
Supply is from Latin supplere which means to fill up or complete.
In the context of Economics it was Adam Smith in 1776 who first used them as
corresponding concepts. Marshall has compared them to the two blades of a pair of
scissors. Just as the scissors cannot work without either of the two blades, Market
Equilibrium cannot be determined without reference to both Demand and Supply.
Demand is desire backed by purchasing power. A buyer or consumer does not

merely desire a commodity or good (or service) but has some power or wherewithal
to purchase it at a price. Similarly, a seller or producer does not merely offer his
commodity or good (or service) but offers them at a price.

There exists at any one time a definite relationship between the market
price of a good and the quantity demanded of that good. This relationship
between price and quantity demanded/bought is called the Demand
schedule or Demand function or Demand curve.
One usual form that the Demand curve can take is downward-sloping from left to
right. Based on the Demand schedule below, this is depicted as follows:
Demand Schedule
Price(P) Quantity Demanded (Qd)

Rs per kg Kg
A 5 9
B 4 10
C 3 12
D 2 15
E 1 20
Demand Curve
Prices are measures on the vertical axis and the quantities demanded on the
horizontal. Each pair of Q,P numbers from the Demand Schedule is plotted here as a
point on the Q-P plane, and a smooth curve passed through the points to yield the

Demand `curve’. It slopes downwards from Left to Right, showing an Inverse or

Negative relation between price and quantity.
There exists at any one time a definite relationship between the market
price of a good and the quantity the producers of that good are willing to
offer or supply. This relationship between price and quantity supplied is
called the Supply schedule, function and curve.
Based on the Supply schedule below, a supply curve can be depicted. Usually it
slopes upwards from left to right.
Supply Schedule
Price (P) Quantity Supplied(Qs)
Rs per kg Kg
A 5 18
B 4 16
C 3 12
D 2 7
E 1 0
Prices are measures on the vertical axis and the quantities supplied on the
horizontal. Each pair of Q,P numbers from the Supply Schedule is plotted here as a
point on the Q-P plane, and a smooth curve passed through the points to yield the
Supply `curve’. It slopes upwards from Left to Right, showing a Direct or Positive
relation between price and quantity.

To find the Equilibrum, the two schedules must be matched or, the two curves
superimposed on each other. At the price where the quantity demanded is the same
as the quantity offered, that is, at the point where the Demand curve and the Supply
curve intersect, there is a perfect matching or balancing, i.e., equilibrium.
Putting the two schedules together, we find that only at P=3 will both Qd and Qs be
the same, viz., 12. Putting the two curves together, we find that they intersect at
(only) the point (12, 3). At the (point 12,3) thus, there is equilibrium. This
equilibrium holds, until and unless there is some external reason tipping the scales
either way. At any price lower than Rs 3 per kg, suppliers would not come forth with
the quantity that the buyers are demanding( 12 kgs). At any price that is higher,
buyers will not be demanding the quantity that suppliers are willing to supply at
those(higher) prices. At any price higher or lower than Rs 3 per kg, there will be
Excess Demand or Excess Supply in the market.
The above Demand and Supply are individual in nature, belonging to an individual
person, household or firm. In Macro-Economics the corresponding concepts are
Aggregate Demand and Aggregate Supply. They represent the total demand and
supply of the economy as a whole.
Types of Markets
Markets can be of different types depending on the type of goods and services being
bought and sold in it.
The most common is the market for specific goods or commodities, i.e., of concrete,
physical things like items of food and clothing. Services can also be bought and sold,
e.g., travel and entertainment, the treatment of physicians and lawyers. The Share
Market is where shares of various companies are bought and sold. Domestic market
refers to markets within the boundaries of a country, whereas the Foreign or
International market refers to transactions taking place across two different

countries using two different currencies. All these markets come under the purview
of Economics. But there is a difference in the approach in which Micro-Economics and
Macro-economics looks at markets.
Micro-Economic looks at markets in the sense of individual buyers(consumers or
households) and individual sellers ( producers or firms) coming together to perform
their respective roles in the market transactions. It is concerned with whether there
are numerous buyers and sellers or just a few ( or even one), whether the product
(good, commodity, or service) is homogeneous or differentiated, whether there is
perfect information about the products(output) and factors of production (input),
whether the factors (inputs) can freely move between alternative uses, and such
conditions. Depending upon the configuration of such conditions, the market takes
different forms such as Perfect competition, monopolistic competition, Monopoly, and
so on. A large part of Micro-Economics is devoted to the study of these market
forms.
In Macro-Economics, the markets concerned are overall or aggregate in nature.
Both Micro-economics and Macro–economics look beyond national boundaries.

International Trade Theory, first formulated by none other than Adam Smith, is an
essential part of Micro theory. Open Economy models, for example by Mundell and
Fleming, are also integral parts of Macro theory.
In addition to studying the exchange of actual goods and services, Micro Economics
also studies the attainment of Satisfaction or Welfare that comes from such
exchange, both at individual and social levels. In fact, this is one of the basic
questions that Adam Smith was preoccupied with. How is Social Welfare , as distinct
from the welfare of individuals, to be reached? Welfare Economics is the part of
Micro-economics which studies this, and there is no counterpart in Macro-economics.
9. Concept of ceteris paribus – General Equilibrium Partial Equilibrium
Economics is a complex subject, rooted in the reality but often analyzed through
abstract thinking and mathematical methods.
As symbols of that reality, Economics makes use of the Mathematical concepts :
Variables, Constants and Parameters.
Variables are entities that take different values. They are usually symbolized by x, y ,
z. and take values positive and negative ranging from minus infinity to plus infinity.
Constants are entities that , for one particular analytical exercise, take one
particular value. They are usually symbolized by a, b, c .. or alpha, beta, gama. And
again, can take any value between plus-minus infinity but can take only one such
value during a particular analysis.
Parameters are entities that can be assigned different values for different variants of
an exercise but in any one particular variant, can take only one such value.
Variables can be dependent or independent. An in dependent variable takes on
values by itself. A Dependent variable takes on values according to or as per the
Independent variable. This relation of dependence between the Independent and the
Dependent variable(s) is known as a functional relationship, or simply, a Function. It
means that the Dependent variable functions according to the Independent variable.
It is a most powerful tool in the sturdy of Economics, both Micro and Macro.
In Economics, a Function may involve more than one variable. Usually, several
variables are interlinked. To examine whether any two have a causal ( cause-effect)
relationship, it may be necessary to rule out others that complicate the issue or get
in the way of analyzing it. Then what is done is to make an assumption known as the
ceteris paribus assumption.

In Latin Ceteris means òther things or the rest’ and Paribus means ` at par or
equal’. The phrase ceteris paribus thus means ‘other things being the same’. It
qualifies or conditions a causal relationship between an independent variable and the
dependent variable that depends on it or functions according to it.
Suppose we take up the following Functional Relationship
The Quantity (Qx) of a Commodity being demanded ( symbolized by the variable x)
depends on the Price (Px) of the Commodity, the Prices of other commodities (say, y
and z) that can complement or substitute it, the Income (Y) and Tastes(T) of the
person making the demand.
Symbolically this can be written as
Qx = f( Px, Py, Pz, Y, T)
where Qx is the dependent variable, Px, Py and Pz , Yand T the independent
variables, and f is the functional form.
Now if we want to focus on the causal relationship between the Price of the
commodity (Px) and the Quantity of it that is demanded (Qx), and for the time being
put aside the prices of commodities and the tastes of the consumer, this can be
written as
Qx = f(Px), ceteris paribus.
This simple yet powerful technique, used extensively by Alfred Marshall, is known as
Partial Equilibrium Analysis. However it lets only one market (at a time0 be in
equilibrium and may not capture the complexities of the real world.
General Equilibrium Analysis is a contrasting technique, first formalized by Leon
Walras. This does not use the ceteris paribus assumption. It lets the inter-
dependence of various variables play themselves out. Prices of Commodities are
determined simultaneously and mutually. All markets are simultaneously in
equilibrium.
In a static equilibrium all quantities have unchanging values but in a dynamic

equilibrium various quantities may be growing , only their ratios being unchanged.
Comparative Statics compares two static cases of equilibrium. Comparative
Dynamics compares two dynamic equilibria.
11. Short-Run and Long-Run Equilibrium
A run is a length of time, not exactly specified. If all factors of production can be
varied during a length of time, it is called the Long Run. If some variables can be
varied but others cannot, i.e., are fixed, it is the Short Run. A Short Run Equilibrium
is one that holds
The highest recognition for economists is the “Sveriges Riksbank Prize in Economic
Sciences in Memory of Alfred Nobel” . Though created by Sweden’s Central Bank in
1968, nearly 75 years after Nobel prizes in physics, chemistry, literature, peace and
medicine/physiology were set up in 1895, this is regarded as the Nobel Prize in
Economics. The first two to receive this were Ragnar Frisch and Jan Tinbergen in
1969.Paul A. Samuelson received it in 1970. In 1998 , it went to Amartya Sen from
India.
Value Addition 2: Test Yourself

Now we suppose you should be able to answer the questions:

1.What is Micro-Economics?
2. Give three keywords that you think must be included to define it.
(Hint: individual, branch)
13. Summary
 Economics studies human choice among alternative uses of scarce
resources.
 It is a Social Science has a wide scope. It aids the understanding of
the central problems of an economy.
 Demand and Supply of goods and services determine their
Equilibrium Price and Quantity in the Market.
 Markets can be of various forms.
 Equilibrium can be Partial and General, Long-Run and Short-Run,
Dynamic and Static.
14. Exercises
Short Questions
1. How would you define Micro-Economics?

2. What are the three central problems of an economy?
3. What does ceteris paribus mean ?
Long Questions
1. Describe the evolution of the subject Economics.

2. Explain the concept of Market Equilibrium.
3. Is Economics a Science or an Art?
4. What subjects have a relation with Economics?
15. Glossary
Variables
Constants
Hypothesis
Model
Demand supply
Market
Equilibrium
Static Equilibrium
Dynamic Equilibrium
Long Run
Short run
General equilibrium
Partial Equilibrium
16. References

1. Economics, Paul A Samuelson

2. Microeconomics, Robert S. Pindyck, DanieL.Rubinfeld, Prem L. Mehta
17. Activity
Go to the nearby market for fruits and vegetables and observe the people going
about the daily business of buying and selling.
Go to a mall or supermarket and do the same.
Jot down any differences you may find.
18. Quiz
Was Adam Smith English, American, Scottish or French?

When was his book The Wealth of Nations published?
Did Alfred Marshall teach at Oxford University, Cambridge University or the London
School of Economics?
Who won the Nobel Prize in Economics this year?

Nature and Subject Matter of Economics
Discipline Courses-I
Semester-I
Paper I: Principales of Economics (POE)
Unit-I
Lesson: Nature and Subject Matter of Economics
Lesson Developer: Neha Goel
College/Department: Shyamlal College, University of
Delhi
Institute of Lifelong Learning, University of Delhi

 1: Learning Outcomes
 2: Introduction
 3: The Basic Competitive Model
 4: Incentives and Information
 4.1: Property Rights
 4.2: Prices, property rights and profits
 5: Rationing
 6: Opportunity Sets
 7: Economic Systems and Gains from Trade
 8: Comparative advantage and Trade
 9: Summary
 10: Exercises
 11: References
 12: MCQs
1. Learning Outcomes
After you have read this chapter, you should be able to:
- Understand the subject matter of economics
- Explain the basic competitive model
- Understand how prices, property rights and profits provide information and
incentives
- Define rationing and understand its types
- Explain opportunity sets and economic systems
- Have a better understanding of the concept of comparative advantage, trade and
gains from trade
2. Introduction
Economics, as a subject, is a combination of arts and science, which contains the

principles and laws with the help of which the functioning of the economy and its
variants takes place. The following basic facts define the existence of an economy-
1.Consumers have unlimited wants for goods and services.
2.There is a scarcity of the productive resources which helps to produce goods and
services to satisfy human wants.
Thus, Economics is a study of how to produce goods and services from scarce resources
so as to satisfy the human wants and needs and how sustainably we consume these
goods and services.We can use the production possibility curve (PPC) to depict the
problems of scarcity and choice-making. PPC schedule gives the various combinations of
the two goods which can be produced using the fixed amount of productive resources
assuming that the resources are fully utilized.

Have you ever imagined that when you go to a market and the mango seller sells the
mangoes for Rs. 50/kg, who decides the price, and if you want to buy 5kgs, why does
the seller sells the mangoes at Rs.40/kg? Have you ever thought why swimming is
allowed for free in the river Ganges and paid in a swimming pool in a hotel? Have you
ever imagined why Chinese items are cheap or why we export rice? Have you ever
thought how the goods and services get produced or sold or charged differently in
different markets or how are they traded among different countries? There may be a one
word answer to it i.e. competition. We may now discuss various terms and definitions
that would help us get the answers to the questions above.
3. The Basic Competitive Model
There are two participants in the market i.e. Producers and Consumers. There are a
large number of buyers and sellers in a competitive market and thus they compete
among themselves. Producers compete with each other by providing the desired
products to the consumers at the lowest possible price and the consumers compete with
one another by paying the price for the products they are willing to buy, while others
may not be able to afford the product.This is known as the basic competitive model.
The basic competitive model is themodel which assumes that the firms are interested
in profit maximization, consumers are rational or self-interested and the
markets are perfectly competitive.
The consumers are assumed to be rational as they make choices in their own self-
interest i.e. they make a choice such that their satisfaction is maximized. For example,
Ram may prefer leisure over work and can exchange a lower income for longer holidays
and Rahul may be ambitious and hardworking and willing to work for longer hours to
fulfill his dream of buying a bunglow.
The firms are also assumed to be rational as they operate with the motive of profit
maximization.
Perfectly competitive markets are those where a single producer have no power to set
the price of a product as there are many sellers selling homogeneous products and the
market mechanism (interaction between demand and supply) determines the price and
quantity of the product.

Prices: generally, market mechanism i.e. interaction between demand and supply
determine prices. For example, price of air cooler may be less in winters due to low
demand but increases in summers with increase in demand. Sometimes, government
intervenes and determines the price either fully or partially. For example, GOI fixes up
price for critical goods like petrol and necessities like wheat to protect the interest of
both producers and consumers.
** Test Yourself: What are the assumptions of basic competitive model? How
are prices determined in a competitive market?
4. Incentives and Information
Incentives are the core of economics. Without incentives, have you ever thought why
would someone take the risk of inventing a new product or save for future contingencies
or work hard? In an economic system where government takes the decisions, it makes
central plan to decide what to produce, how to allocate the resources to produce those
goods and whom to sell the goods. This economic system has a drawback that since
nobody owns the resources that are used in the production of goods and services, the
resources may not be fully utilized. So property rights should be enforced on the
resources so that the private owner will have an incentive, to produce goods and
services, if the resources are fully utilized.
For example, WTO (World trade organization) contains the protection of Intellectual
Property Rights (IPR). Intellectual property is a tangible form of original creative work of
mind, eg. , literary works, industrial designs etc. Thus, it should be legally protected. IPR
is needed to control or manage the intellectual property so that the creator should be
benefitted in order to have incentive for innovation. IPR also encourages and promotes
creativity as no one can copy other person’s work without permission. Usage of internet
has helped in tremendous way to spread information among people saving time and
money. Information and incentives are provided by the market economies through
prices, profits and property rights.
4.1 Property Rights
Property refers to ownership and control over a good or resource. It is a characteristic of

an economic good. The attributes of an economic good involves the rights to use it, to

earn income from it, to trade/exchange it with others and to enforce property rights.
Property rights are laws created by governments which keeps a check on how a resource
is used and who is the owner of that resource- government, individuals or collective
bodies. For example, if you lend your car to your sister, she won’t be having a legal
property right to the car, or suppose your car gets stolen, the thief won’t be having a
legal property right to the car but will just have economic property right to the car.
A good or a resource should have properly defined property rights, the possession of the
rights must be enforced so that the use of the good or resource can be controlled.
Property can be classified into four groups:
1. Open access property:– this kind of property is neither owned, nor controlled or
managed by any individual. It is non-excludable i.e. no individual can be excluded to use
this property. However this kind of property can be rival i.e. only used/consumed by one
consumer. Thus, if someone uses it, the quantity available for another individual gets
reduced. For example, if there is a fishery in a village, Fisherman A can catch any
amount of fishes from the fishery, the more the fishes he catches, the lesser fishes will
be available for other fisherman. Thus, the government should define proper property
rights so that the good or resource should be used ethically and available for all. The
government can divide different portions of the fishery among fishermen where they can
involve in fishing. Thus, the government can convert an open access property into
common, state or private property by enforcing property rights on it.
2. Common property:– This kind of property is jointly owned by a group of individuals.
Thus, it is commonly decided by the joint owners how to use, control or manage the
property and who should be excluded from using the property. Thus, the enforcement of
the property rights and the benefits from the property is shared by the joint owners and
thus it is easier to solve conflicts if any, unlike open access property. For example,
amusement park in a colony is a common property of the residents of that colony. The
residents manage and control the park and can decide who can use it.
3. Private property:– This kind of property is owned by an individual or a group of
individuals. For example, a building can be owned by an individual or a group of people.
Private properties are both excludable and rival i.e. the owner/owners decide how to
use, control or manage the property and who should be excluded from using it. The
owner may decide whether to rent out the building or reside himself.
4. Public property:– This kind of property is controlled and managed by the
government, although owned or used by all the individuals. For example, street lights,
monuments etc.
4.2 Prices, Property Rights and Profits
Limited supply of a good or resource implies higher price of that good, example, Gold,
Petrol etc. Whereas, goods available in bulk are cheaper, example, Paper, local market
clothes etc. Thus, prices provide information about the availability or scarcity of a good
or a resource. The consumers respond to this information by buying the goods if they
are willing to buy and able to pay for them and the producers respond to this information
with the motive of profit maximization. The producers can maximize their profits by
using lesser amount of scarce resources and producing what the consumers are willing
to buy. For example,if due to heavy rains, tomato yield is less, its price rises and the
rational consumers would reduce its consumption whereas, the rise in price of tomatoes
would signal the rational producers (farmers) to grow more tomatoes. Thus, prices are a
signal for firms and individuals to take rational decision.
This profit motive can only be effective if there are clearly defined property rights for a
good or resource. There must be properly defined private property for the firms and
individuals so that they have an incentive to invest in a new plant or technology or hire
trained candidates or produce goods and services utilizing the available resources
efficiently. For example, Mr. A has property rights on his building as he bought it, and he
may decide to rent out a part of it to get some return on his investment. If the tenant
doesn’t pay the rent on time, Mr. A suffers the consequence i.e. loss in income. Thus, if

he made a right decision, it was the incentive he got for renting out his building, if he
made a wrong decision, it would give him an incentive to check the financial stability of
the tenant next time.
** Test Yourself: Define property rights. How do prices, profits and property
rights provide information and incentives?
5. Rationing
Rationing is just another way to deal with the problem of scarcity in economics. It is a
way to control or manage the distribution of scarce goods or resources. Ration may be
defined as the allotment of resources to an individual. Rationing keeps in check the size
of the ration being distributed on a particular day/time. Now we may discuss the various
ways in which rationing is used:
1.Rationing by Lotteries:–It is a system in which goods are allocated by a random

process, like picking a chit from a bowl. Allotment of DDA flats in Delhi is a good
example of rationing by lotteries that took place recently. It is a fair process where
everyone is given equal opportunity to own the good. However, this system is not that
efficient as the individual who is willing to buy and values the good more, may not get
the scarce good or resource. In case of DDA flats rationing in Delhi, the person who got
the flat may not value the flat because of its location or condition or may not be able to
buy it.
2. Rationing by Queues:– It is a system in which goods are allocated to those who are
willing to wait in a queue. Thus the price of the good does not change or vary as the
goods are not provided to those willing to buy or able to pay for that good. Like the
lottery rationing, it is a fair system. For example, interviewers conduct the interviews of
the candidates in a queue or a doctor consults the patients according to the queue.
However, this system is also inefficient as waiting in a queue involves wastage of time
which is an important resource. For example, in case of medical care, if some people are
willing to and able to pay a higher amount to get treated, it would result in increase in
monetary resources of the hospital which can be utilized to employ more doctors, thus,
reducing the queue and improving the medical facilities too.
3. Rationing by coupons:– It is a system in which goods are allocated to those who

buys or gets the coupon. For example, Ram used to get a food coupon of Rs. 50 in his
company daily. He could get the food only if he presented that coupon in the counter.
Coupon system can be classified as tradable or non-tradable. Continuing with our
example, if Ram gets food from his home, Rs. 50 coupon gets wasted daily as the good
did not go to the individual who is willing to buy and able to pay the most. Now, Shyam
has a huge diet and values the food coupon (wasted by Ram) more. So Ram can trade it
with Shyam, who is willing to buy and able to pay for it. Thus, if coupons are non-
tradable, this system becomes inefficient like the other two systems. However, if the
goods are tradable, it may give rise to the black market.
** Test Yourself: What do you mean by rationing/ Why do we need it? Which
is the best and fair way of rationing?
6. Opportunity Sets
It is a group of available options which emerges from the core idea of trade-offs and
scarcity in economics. We already know that budget and time constraints define the
availability of choices. Since resources are scarce, including time, people must make

choice in such a way that in order to get some good, they have to sacrifice the
consumption of another good. For example, you go to a movie hall and find two movies
released- superman and batman. Now you have following choices, either watch
superman or batman or back to back both the movies or none of them. This is your
opportunity set. Watching iron man is completely irrelevant as it is out of your
opportunity set. You may spend time yearning the movie iron man or any other movie
but it makes no sense. Thus, an opportunity set defines the limitation to the choices
made by an individual. We need to discuss the following concept to understand it clearly:
1. Budget constraint – opportunity sets in which money imposes constraints are known
as budget constraints. For example, Seema has Rs. 100 and she consumes two goods
i.e. burger and pizza priced at Rs. 10 and Rs. 20 respectively. The opportunity set and
PPC of Seema is as follows:
2. Time constraint – opportunity set in which time imposes constraints are known as
time constraints. For example, a farmer works for 8hrsand he produces two goods -
butter and rice. He produces 16kgs of rice in 8hrs and 4kgs of butter in 8hrs. If he
devotes half of his time to both, then he can produce 8kgs of rice and 2kgs of butter in
4hrs each (if he chooses not to trade i.e. point A). Following is the PPC and time
constraint of the farmer:

3. Cost and Opportunity Cost – Making choices out of the scarce resources (trade-off)
involves some cost (cost of sacrificing a good to choose another good) and some benefits
(an incentive, to consume the good that we choose, in terms of satisfaction). For
example, you may choose to attend a birthday party (benefit) at the cost of bunking
your tuition (cost). Making trade-offs may involve diminishing marginal utility i.e. the
utility of a good diminishes as more and more units are available for consumption. For
example, suppose electricity is a scarce resource. If one unit of electricity is available,
we may use it for lighting, if two units are available, we may use it for cooking, if three
units are available, we may use it for washing clothes and if more units are available we
may use it for less important purpose like playing video games. Thus, as more and more
units of electricity are available to us, the utility of it decreases.
If we look at an opportunity set, relative prices (price of one good in terms of another)
explains the trade-off. Let us get back to the example of Seema consuming pizza and
burger. In our example, burger costs Rs. 10 and pizza costs Rs. 20. Now,
Relative price of pizza in terms of burger = Rs. 20/Rs. 10 = 2. Thus, for every pizza she
sacrifices, she can get two burgers.
Now, from the concept of relative prices and trade-offs, we derive the concept of
opportunity cost. When resources are scarce, me make choices. When we make choices,
we sacrifice consumption of one good, to consume another good. Cost of sacrificing a
good which is the best alternative to the good that we choose to consume is known as
opportunity cost. For example, if Mr. X built a house which is bigger than his
requirement, he could keep a paying guest in one of the rooms and earn Rs. 2000 per
month. If he does not want to keep the paying guest, Rs. 2000 rent foregone is the
opportunity cost of not keeping a paying guest. Or suppose after graduation, you get a
job of Rs. 20000 per month. But you choose to continue your studies and go for post
graduation. The income of Rs. 20,000 is the part of the opportunity cost of your time
that you choose to study and not work. Now, this foregone income must be added to
your college fees to get the opportunity cost of attending college.
** Test Yourself: Apply the concept of opportunity cost to explain why some
students from lower income group cannot complete schooling.
7. Economic Systems and Gains From Trade:
An economic system is an organized way in which goods and services are produced in
the economy utilizing the resources in the best possible way, allocated by a state or
country.
We assume that the domestic market is perfectly competitive and thus the producers
and consumers rationally decide to utilize the resources efficiently. Thus, the problem of
scarcity and choice is solved by the market forces which determines what to produce, for
whom to produce and how much to produce. Suppose we take an economic system
where countries are engaged in free trade i.e. trading without policy restrictions. We
assume that a country produces and exports the good in which it has a relative
advantage (cost of producing that good in comparison to other good is less) and imports
the good in which it has greater opportunity cost. For example, an Indian farmer works
for 8hrs and he produces two goods - butter and rice. He produces 16kgs of rice in 8hrs
and 4kgs of butter in 8hrs. If he devotes half of his time to both, then he can produce
8kgs of rice and 2kgs of butter in 4hrs each (if there is no trade). His labor cost of
producing one unit of rice is 0.5hrs (8hrs/16) and labor cost of producing one unit of
butter is 2hrs (8hrs/4). Thus, opportunity cost of producing rice = 0.5/2 = 0.25 and
opportunity cost of producing butter = 2/0.5 = 4. Thus, he will produce and export rice
as he has less opportunity cost in rice and will import butter.
We take the basic competitive model for the domestic market.

When there is no trade, equilibrium price and quantity in the domestic economy are P*
and Q* respectively. Area of the triangle AP*E is the consumer surplus (as market price
is lower than the price the consumers are willing to pay) and area of the triangle BP*E is
the producer surplus (as market price is higher than the price at which they are willing to
sell. Total social welfare is equal to the area of triangle AEB (i.e. CS + PS).
When free trade is allowed, the world price is Pw which is lower than the domestic price.
This means the domestic producers will supply less of it and the consumers will be
better-off if they import the good. Thus quantity supplied reduces to Q1 and quantity
demanded increases to Q2. The gap between the demand and supply is filled by imports.
Thus Q1Q2 is the amounts of imports by the domestic consumers. It is obvious that the
domestic producers cannot charge a higher price than Pwas no one will be willing to buy
the good at a higher price if cheaper imports are freely accessible. Now, the consumer
surplus increases to the area APwD and the producer surplus reduces to the area BPwC.
However, the total social welfare has increased from area of the triangle ABE to area
ABCDE. Thus, net gain in welfare due to trade is represented by the area of the
triangle CDE. The lower the prices in the world, the higher would be the gain
from trade.
**Test Yourself: How does an economy in a free trade regime get benefit
from international trade?
8. Comparative Advantage and Trade
Comparative advantage is defined as the ability of a producer to produce and specialize

in the production of a good in which it has a lower opportunity cost than the other
producers. Let us continue with the example of the Indian farmer whose opportunity cost
of producing rice is 0.25 and that of butter is 4. Now if there is an American farmer
whose labor cost of producing rice is 0.67hrs(produces 12kgs of rice in 8hrs) and labor
cost of producing butter is 1hr (produces 8kgs of butter in 8hrs). His opportunity cost of
producing rice is 0.67 and producing butter is 1.5. Thus, the Indian farmer should
specialize in rice production and the American farmer should specialize in butter
production. We may note here that the American farmer has an absolute advantage in
producing both rice and butter (his opportunity cost of producing both the goods is less
than the Indian farmer). This would mean that the farmers won’t enter into international
trade and both the farmers will engage their time in producing both the goods. Here the
concept of comparative advantage comes in.
Country Labor cost of production (hrs) Opportunity cost of production

1 unit of rice 1 unit of 1 unit of rice 1 unit of
butter butter
India 0.5 2 0.5/2 = 0.25 2/0.5 = 4
America 0.33 1 0.67/1=0.67 1/0.67=1.5

We can see from the table above that India has a lower opportunity cost in producing
rice (0.25 as compared to America’s 0.67) and America has a lower opportunity cost in
producing butter (1.5 as compared to India’s 4). Therefore, India has a comparative
advantage in producing rice and America has a comparative advantage in producing
butter and both the countries should specialize and export to the other country that good
in which it has a comparative advantage. Thus, a country has acomparative
advantage in producing a good if the opportunity cost of producing in the home
country is less than that in the foreign country.
9. SUMMARY
- The basic competitive model assumes consumers to be self-interested, firms to be

profit maximizing and markets to be competitive.
- When resources are scarce and choices have to be made, there is a trade-off between
choice and scarcity. Choices have to be made such that the resources are efficiently
used.
- Problem of scarcity is dealt with price system, rationing etc.
- An opportunity set shows different bundles that an individual chooses and the trade-off
they face while sacrificing a good to choose another good.
- A PPC is an opportunity set which shows various bundles of goods that are available
given the time and budget constraints. An individual is most efficient if he chooses to be
on the PPC curve.
- An opportunity cost is the cost of sacrificing the best alternative good to consume
another good.
- A country has a comparative advantage in producing and exporting a good if the
opportunity cost of producing in the home country is less than that in the foreign
country.
10.Exercise
1. What do you mean by rationing? What are the methods of rationing?

2. What are property rights and how does it provide incentive to a consumer?
3. What is an opportunity set? Explain the concept of opportunity cost with example.
4. Define comparative advantage and how does it lead to gains from trade?
11.References
1. Joseph E. Stiglitz and Carl E. Walsh, Economics, W.W. Norton & Company, Inc., New
York, 4th edition, 2007.
2. N. Gregory Mankiw, Economics: Principles and Applications, South Western, Cengage
Learning Pvt. Ltd., 4th edition, 2007.
12.Multiple Choice Questions (MCQs)
1. Which of these is not an assumption of Basic competitive model?

A. Profit maximizing firms
B. Self-interested consumers
C. Competitive markets
D. Sticky prices
2. What are the various types of rationing?

A. Rationing by queues
B. Rationing by lotteries
C. Health care rationing
D. All of the above
3. Which idea does David Ricardo uses in favor of free trade?

A. Mutual advantage
B. Absolute advantage
C. comparative advantage
D. Bilateral advantage
4. A country has acomparative advantage in producing a good if the ______

cost of producing in the home country is less than the ______ cost in the
foreign country.
A. Opportunity, sunk
B. Opportunity, variable
C. Opportunity, Labor
D. Opportunity, Opportunity
5. Which of the following is an example of intellectual property?

A. Building
B. Park
C. Poetry
D. None of the above
6. Which of these gives rise to trade and gains from trade?

A. Absolute advantage
B. Comparative Advantage
C. None
D. Both A and B
7. A cricket player cannot complete his four year graduation because?

A. Opportunity cost of attending college is higher than playing cricket
B. Opportunity cost of attending college is lower than playing cricket
C. He valued cricket more
8. Which kind of property is owned by all but managed and controlled by

government?
A. Open access property
B. Private property
C. State property
D. Common property
9. What is social welfare?

A. Consumer surplus
B. Producer surplus
C. Consumer + Producer surplus
D. Opportunity Cost
10. Laws created by governments which keeps a check on how a resource is

used and who is the owner of that resource?
A. Human Rights
B. Right to information
C. Property Rights

ANSWERS:
1. D
2. D
3. C
4. D
5. C
6. B
7. A
8. C
9. C
10. C

Demand, Supply and Market Equilibrium
Semester-I
Unit-II
Lesson: Demand, Supply and Market Equilibrium
Lesson Developer: Ankur Bhatnagar
College/Department: Satyawati College, University of Delhi
Institute of Lifelong Learning, University of Delhi 1

CONTENTS:
1. Learning Outcome
2. Concept of demand by a consumer
2.1 Demand schedule and demand curve
3. Derivation of market demand schedule and market demand curve.
4. Determinants of demand
5. Concept of supply by a firm
5.1 Supply schedule and supply curve
6. Derivation of market supply schedule and market supply curve
7. Determinants of supply
8. Factors that determine shifts in demand curve
9. Factors that determine shifts in supply curve
10. Concept of equilibrium and effect of changes in demand and supply on

equilibrium
1. LEARNING OUTCOME
After reading this chapter you will be able to know:
I. The concept of demand, determinants of demand, demand schedule

and how to draw demand curve, law of demand, change in demand and
change in quantity demanded. Individual and market demand.

II. The concept of supply, determinants of supply, supply schedule and

how to draw supply curve, law of supply, change in supply and change in
quantity supplied. Individual and market supply.
III. Concept of equilibrium, concept of shortage and surplus, impact of

change in demand and supply on the equilibrium.
2. CONCEPT OF DEMAND
When we say that a consumer demands a good like a car it implies that she is
willing to pay a ‘certain’ price in return for a pre-determined amount of the good.
This ‘willingness ‘lies at the heart of the demand theory. In economics, this
willingness is expressed in terms of Desire, Ability and Willingness.
Consider a BMW sports car with a price tag of Rs. 25 lac . A 18 year girl student
would like to own this car. However, she would not constitute demand for this car
because she lacks to ability to pay the stated price of the car. She has the desire to
drive and the willingness to pay for it (she does not want it for free), but lacks the
ability to pay the stated price since she is a student with no income. Thus, demand
is not just willingness to pay for a good at a stated price but also the desire and
ability to pay for it. However, she may be willing to pay a lower price of Rs. 5 lacs.
If this price is acceptable to the makers of BMW then she constitutes demand for
the car.
Assuming that desire and ability exist we can say that demand for a good is
equivalent to willingness to pay for a good. This explains why the terms ‘demand
curve’ and ‘ willingness to pay’ curve are used interchangeably.
2.1. DEMAND SCHEDULE AND DEMAND CURVE
A consumer demand schedule gives the various combinations of price and demand
of a good for a consumer in a table form. For example, it tells us the willingness of
a consumer to pay for oranges at certain prices. The relationship between price and
quantity is shown using specific values in the table below. At a price of
Rs.10/dozen, the consumer is willing to consume/purchase 4dozen. At a price of Rs
30/dozen the demand falls to 2 dozen.
A demand curve is a graphical representation of the demand schedule. The

demand curve slopes downwards to show that as price rises, the demand for a good
falls, assuming all other factors remain constant. A demand curve can be drawn
using a demand schedule or a demand function (see section IV). A demand function
is a mathematical relation between price and quantity demanded.

DEMAND SCHEDULE FOR ORANGES
PRICE (Rs per QUANTITY DEMANDED

dozen) (dozens)
10 4
30 2
If this relationship can be expressed in a mathematical expression then this

expression is called a demand function. For example demand for oranges is denoted
by Qd; where
Qd = 5 – 0.1P
Notice that the sign for P is negative, which indicates that demand curve is
downward sloping. Another way of saying this is that slope of demand curve is
negative.
When P= 10 then Qd= 5 –.1*10 =4
When P = 30 then Qd = 5 –.1*30 = 2

3. DERIVATION OF MARKET DEMAND SCHEDULE AND MARKET DEMAND

CURVE
The market demand schedule provides the total demand for a good in the market.
It represents the sum of demand by all consumers. It is the horizontal summation
of all individual demand curves.
EXAMPLE:
Assume 3 consumers in the market, whose demand schedules are given below. Let
us graphically and numerically show the market demand; we assume the following
demand functions:
Ravi: Q1= 10-P
Chavi: Q2= 12-2P
Pami: Q3= 8-4P
Market demand is the horizontal summation of individual demand curves. It is

derived by adding the demand at given price P.
Market demand = Q*= Q1+Q2+Q3= 10-P +12-2P +8-4P = 30-7P
Q*=30-7P
P Ravi Chavi Pami Total demand
1 2 5 4 2+5+4=11
2 1 3 3 1+3+3=7
3 0 1 2 0+1+2=3
Market demand schedule
P Total
demand=market
demand
1 2+5+4=11
2 1+3+3=7

3 0+1+2=3
4. DETERMINANTS OF DEMAND
Demand for a good is determined by monetary and nonmonetary factors. These can
be expressed using the demand function Qd where
Qd= f( Px, Py, M, F)
 Qd or demand for good X is a function (f) of
 Px: price of the good,
 Py: price of good Y that is related in some way to good X,

M: income of the consumer and F: non-monetary factors like season, fashion, etc.
The last factor is subjective and can’t be defined in a mathematical expression.
We now examine the relation between demand for a good with each determinant
separately.
Demand and Px: The relation between demand and price of a good is based on the
law of demand. As price rises, the demand for a good will fall, ceteris paribus
(assuming all other factors – Py, M, F are unchanged) . This explains the negative
slope of a demand curve. In some cases this law may not be obeyed and there can
be a positive relation between price and demand. Such goods are exceptions to the
law of demand and called GIFFEN goods.
Demand and Py: there can be two types of relation between X and Y. The first is
that they are complements to each other. This means they are always consumed
together and it is not useful to consume them alone. A rise in price of Y will cause a
fall in demand for both X and Y. The common examples include a mobile phone and
a SIM card ( a mobile phone is useless without a SIM card) , shoes and socks( it is
not comfortable to wear shoes without socks). The other relation is that of
substitutes. As price of Y rises, the demand for X will increase as the demand for Y
declines; X substitutes for Y. Common examples include a laptop and a personal
computer, a WIFi connection and a data card for use on a mobile phone. (a phone
that needs Internet connectivity need to use only 1 of these- WiFi or a data card).
Demand and M: most goods are ‘normal’ as their demand rises with rise in income
levels. Therefore, the relation is positive. For some ‘inferior ‘goods the relation is
negative. Take the case of a non-branded shoe bought from the local market. As
income rises, a consumer may not opt for a similar shoe, and may want to buy a
branded shoe like Nike/ Adidas. Therefore, the non-branded shoe sees a decline in
demand even when income of the consumer rises. This non-branded local shoe is
an inferior good.
Note that when we examine the relation between demand and each determinant,
we assume that all other determinants are unchanged. So when income changes Px
and Py are unchanged. This is also referred to as ‘ceterius paribus’ condition. It can
be translated to mean that all other things remain constant.
5. CONCEPT OF SUPPLY OF A GOOD
5.1. SUPPLY SCHEDULE AND SUPPLY CURVE
A firm’s supply schedule gives the various combinations of price and output of a
good for a firm in a table form. For example, it tells us the ability and willingness of
a firm to produce a certain amount of output of a good at a certain price. The
relationship between price and quantity is shown using specific values in the table

below. At a price of Rs.10/dozen, the orange seller (firm) is willing to sell 4dozen.
At a price of Rs 30/dozen the supply rises to 8 dozen.
A supply curve is a graphical representation of the supply schedule. The supply

curve slopes upwards to show that as price rises the supply of a good rises,
assuming all other factors remain constant. A supply curve can be drawn using a
supply schedule or a supply function (see section VII). A supply function is a
mathematical relation between price and quantity supplied.
SUPPLY SCHEDULE FOR ORANGES
PRICE (Rs per QUANTITY DEMANDED

dozen) (dozens)
10 4
30 8
If this relationship can be expressed in a mathematical expression then this

expression is called a supply function. For example supply for oranges is denoted by
Qs where
Qs = 2 + 0.2P

Notice that the sign for P is positive, which indicates that supply curve is upward
sloping. Another way of saying this is that slope of supply curve is positive.
When P= 10 then Qs= 2 +.2*10 =4
When P = 30 then Qs = 2 +.2*30 = 8
6. DERIVATION OF MARKET SUPPLY SCHEDULE AND MARKET SUPPLY

CURVE
The market supply schedule provides the total supply for a good in the market. It
represents the sum of supply by all firms for a good. It is the horizontal summation
of all individual supply curves.
EXAMPLE:
Assume 2 firms in the market, whose supply schedules are given below. Let us
graphically and numerically show the market supply; we assume the following
functions:
Firm ABC : Qs1= 2 +3P
Firm XYZ: Qs2= 1 +2P
Market supply = Qs* is the horizontal summation of individual demand curves. It is

derived by adding the demand at given price P.
Market supply = Qs*= Qs1+Qs2= 2+3P +3+2P = 5+6P
Q*=5+5P
P XYZ ABC Total supply
1 5 3 5+3=7
2 8 5 8+5=13
3 11 7 11+7=18
Market supply schedule
P Total

demand=market
demand
1 5+3=7
2 8+5=13
3 11+7=18
7. DETERMINANTS OF SUPPLY
Supply of a good is determined by the costs involved in producing the good and non
cost factors as well. These can be expressed using the supply function Qs where
Qs= f( Px, Pinputs, T, F)
Qs or supply for good X is a function (f) of
 Px: price of the good,
 Pinputs: price of inputs that are used to produce the good.
 T: technology involved in production
 F: Non-monetary factors like expectations among firms about future demand,

season, fashion, cyclical factors, the stage of business cycle, etc. This factor
is subjective and can’t be defined in a mathematical expression. We now

examine the relation between supply of a good with each determinant

separately.
Supply and Px: The relation between supply and price of a good is based on the law
of supply. As price rises, the supply of a good will rise, ceteris paribus (assuming all
other determinants unchanged) . This explains the positive slope of a supply curve.
Supply and Pinputs: It is common sense that price at which a firm is willing to sell the
good will depend on the cost of producing it. This cost depends on the cost and
availability of inputs. Higher is the input higher will be the price of a good. A
common example is the local fruit seller who increases the prices of his fruits
whenever the price of petrol is increased. Petro/ diesel is used to transport fruits
from the grower to reach the final consumer through the fruit seller. The transport
costs are therefore part of producing the fruits until they reach the consumer,
which is you. Thus, higher price of inputs will decrease supply.
T and F: these are non-mathematical determinants of supply. In general, a change

towards more efficient technology will lead to higher supply, as the firm is able to
produce more with same inputs. In the same way positive consumer and business
expectations about the economy or/and a general boom period is associated with
higher supply.
Note that when we examine the relation between supply and each determinant, we
assume that all other determinants are unchanged. So when input prices change
Px, F and T are unchanged. This is also referred to as ‘ceterius paribus’ condition. It
can be translated to mean that all other things remain constant.
8. FACTORS THAT DETERMINE SHIFTS IN DEMAND CURVE
The shifts in the demand curve are based on the determinants of demand. We can
distinguish between two types of shifts of the demand curve based on the cause of
the shift
MOVEMENTS ALONG THE DEMAND CURVE: As the word ‘along’ suggests we

need to move on a demand curve in response to a change in price. When price falls
we move from A to B, showing that quantity of X demanded has risen. A fall in
quantity demanded is shown as a movement from C to D.
Movements Along Demand Curve

SHIFTS OF DEMAND CURVE: These are shown as an upward shift or downward

shift of the demand curve. Assume that income of a consumer rises. Initially the
consumer was at point A on demand curve D1, demanding Q1 at price P1. Now with
price unchanged at P1his demand rises to Q2, shown on D2 at point B. The
movement from A to B in response to an income increase is shown as a shift of the
demand curve to the right.
A similar shift occurs when price of the good Y, which is a complement to X falls.
This fall causes an increase in the demand for X shown as a movement from D1 to
D2. Some other examples are listed in the table below:
CAUSE EFFECT ON EFFECT ON DEMAND

DEMAND CURVE

( right/ left shift)
Rise in income Increase Right
Fall in income Decrease Left
Rise in price of complementary decrease Left

good
Fall in price of complementary Increase Right

good
Rise in price of substitute good increase Left
Fall in price of substitute good decrease right
Positive Change in fashion increase right
 Note that shift of the demand curve is caused by changes in non-price factors
( Py, M, F) alone.
 Note that shift along the demand curve is caused by changes in price of the
good alone.
 A shift along demand curve is expressed as a increase/ decrease in quantity

demanded, whereas a shift of the demand curve is expressed as increase/
decrease in demand
 A right shift of demand curve shows INCREASE IN DEMAND
 A left shift of demand curve shows DECREASE in demand
 A shift along the demand curve upwards is a DECREASE IN QUANTITY

DEMANDED.
 A shift along the demand curve downwards is an INCREASE IN QUANTITY

DEMANDED
9. FACTORS THAT DETERMINE SHIFTS IN SUPPLY CURVE
The shifts in the supply curve are based on the determinants of supply as was the
case for demand. We can again distinguish between two types of shifts of the
supply curve based on the cause of the shift.
MOVEMENTS ALONG THE SUPPLY CURVE: As was the case in demand, a

change in quantity supplied is caused by a change in the price of the good. It is

shown as a move along a given supply curve. When price falls we move from A to
B, showing that quantity of X supplied has decreased. An increase in quantity
supplied is shown as a movement from C to D, when Px increases.
Movements Along Supply Curve
SHIFTS OF SUPPLY CURVE: These are shown as an upward or downward shift of

the supply curve due to non price factors- technology, price of inputs, non
monetary factors. Initially the firm was at point A on supply curve S1, supplying Q1
at price P1. Assume that a new technology improves the speed of workers. This
allows greater supply, and is shown as a shift of S1 to S2. The same price P1 now
gets greater supply of Q2, shown on S2 at point B. The movement from A to B in
response to a positive non monetary change and is shown as a shift of the supply
curve to the down and right.

A similar shift occurs when price of an input declines. This fall causes a decline in
the cost of production of the good. The savings are used to produce more of X so
that we move to point B, without any change in Px. Some other examples are listed
in the table below:
CAUSE EFFECT ON EFFECT ON SUPPLY

SUPPLY CURVE
( right/ left shift)
Fall in input prices Increase Right
Rise in input prices Decrease Left
A negative technical change decrease Left
A positive new technology Increase Right
Rise in price of substitute good increase Left
Fall in price of substitute good decrease right
Positive Change in fashion increase right

 Note that shift of the supply curve is caused by changes in non-price factors (
Pinputs,T, F) alone.
 Note that shift along the supply curve is caused by changes in price of the
good (Px)alone.
 A shift along supply curve is expressed as a increase/ decrease in quantity

supplied, whereas a shift of the supply curve is expressed as increase/
decrease in demand
 A right shift of supply curve shows INCREASE IN SUPPLY
 A left shift of supply curve shows DECREASE IN SUPPLY.
 A shift along the supply curve upwards is a DECREASE IN QUANTITY

SUPPLIED.
 A shift along the supply curve downwards is an INCREASE IN QUANTITY

SUPPLIED
10. CONCEPT OF EQUILIBRIUM
Equilibrium is a position of ‘rest’ for all economic agents. At this point no agent will
like to change its position in terms of demand, supply or price. To determine
equilibrium we need the demand and supply curves. Equilibrium is determined
where demand equals supply. Ina diagram it is easy to show that P* and Q* are the
equilibrium values of price and quantity. We can easily show how P* is derived.
Consider price P1 where demand = Q2 and supply = Q1. Demand > supply so that
we have a position of excess demand which is called a SHORTAGE. Consumers are
willing to pay a price of P1 for Q2 while suppliers want to sell Q1 at this price.
When suppliers realize that consumers want more than Q1( which they had
produced), they increase production in next period, for which they ask for a higher
price. The red arrow shows this. As long as a shortage remains, producers will continue to
increase production, until demand equals supply. Now there is no reason to change the production
levels or the demand levels.

Consider price P2 where demand = Q4 and supply = Q3. Demand < supply so that
we have a position of excess supply which is called a SURPLUS. Consumers are
willing to pay a price of P2 for Q4 while suppliers want to sell Q3 at this price.
When suppliers realize that consumers want less than Q3( which they had
produced), they downsize production in next period, and are willing to offer this
lower output at a lower price. The blue arrow shows this. As long as a surplus remains,
producers continue to decrease production, until demand equals supply. Now there is no reason to
change the production levels or the demand levels.

Thus we conclude that a surplus causes prices to fall while a shortage causes prices to rise. At
equilibrium there is no shortage and no surplus, since demand = supply. We now investigate the
effects of changes in demand and supply on equilibrium price and quantity.
Case 1: Increase in demand. The demand curve shifts to the right ( D1 to D2), leading to higher price
and quantity.
Case 2: Decrease in demand. The demand curve shifts to the left (D1 to D3), leading to lower price
and quantity.

Case 3: Increase in supply. The supply curve shifts to the right, leading to higher quantity and lower
price.
Case 4: Decrease in demand. The supply curve shifts to the left, leading to higher price and lower
quantity.
We now determine the effect of simultaneous changes in demand and supply.
Case 5: Increase in demand and supply. we have three possible cases shown in diagram below. Note
that quantity will always rise (as shown by the arrow) while the effect on price depends on
comparative increase in demand and supply.
Case 6: Decrease in demand and supply. we have three possible cases shown in diagram below. Note
that quantity will always fall while the effect on price depends on comparative increase in demand
and supply.

Case 7: Increase in demand and decrease in supply. we have three possible cases shown in diagram
below. Note that price will always rise while the effect on quantity depends on comparative increase
in demand and supply.
Case 8: Increase in supply and decrease in demand. we have three possible cases shown in diagram
below. Note that price will always fall while the effect on price depends on comparative increase in
demand and supply.

The Concept of Demand, Supply & Elasticity
Subject: Economics
Lesson: The Concept of Demand, Supply & Elasticity
Author: Nalini Panda, Associate Professor
Department/ College: Indraprastha College for Women, University of
Delhi
1
Institute of Lifelong Learning,University of Delhi
Table of Contents
 Introduction
o 1.1 demand and Supply
o 1.1.Supply
o 1.1.3 Equilibrium
o 1.2.1 Elasticity of Demand
o Summary
o Exercise
o Glossary
2
1.1 demand and Supply

One of the main tasks of economic theory is to explain why goods have prices and why
some goods are expensive and others cheap. The answer is that they have prices
because, on the one hand, they are useful and, on the other hand, they are scarce in
relation to their various alternative uses. For example, people will have no use for
woolen clothing in a place where temperature is always above 30°C and hence woolens
will never command a price in that locality. In addition to being useful, goods must be
scarce in relation to the uses to which people want to put them, if they are to be priced.
For instance, while air is clearly useful to every human being, it does not command a
price because it is freely available in unlimited amounts. Goods like air, which are useful
but not scarce, are known as ‘free’ goods and do not bear a price. By contrast, economic
goods are scarce and do bear a price. It is only because economic goods are useful that
they are demanded by buyers, and only because they are scarce that sellers cannot
supply them in unlimited quantities. Thus price of any economic good or service is
determined by the interaction of demand and supply. It is now necessary to see more
precisely what demand and supply are.
1.1.1 Demand
1.1.1 (a) An Individual’s Demand for a Product

The demand for a good by an individual consumer (or household) means this individual’s
desire for the good backed by a capacity to pay.
Source: www.drawingcoach.com, www.photos.merinews.com
3
The quantity of a good an individual is willing to buy over a specific time period is a
function of the price of the good, the individual’s money income,and the prices of other
goods. In simple mathematical language it can be expressed as:
Qdx = f (Px, I, Po) (1.1)
where Qdx = the quantity of good X demanded by the individual, over the specific time
period,
f = a function of, or depends on, P x = the price of good X,

I = the money income of individual, Po = the prices of other goods.
In any particular situation if we keep factors other than own price as constant, we can
derive the individual’s demand function for the good as
follows :
(1.2)
where, the ‘bar’ on top of Iand Po means that they are kept constant. Equation (1.2) can
also be written as
Qdx = f (Px) cet. par. (1.3)
where, cet. par. = ‘ceteris paribus’ means everything else held constant.
Eqn(1.3) implies that the quantity of good X demanded by an individual over a speific
time period is a function of the price of that good, while holding constat everything else
that affects the individual’s demand for the good.
Eqn(1.3) is a ‘general’ functional relationship between quantity demanded of the good X

at various alternative prices of X, ceteris paribus. We can also take a ‘specific’ demand
function. For example,
Qdx = 32 – 4Px cet. par. is a specific functional relationship indicating precisely how Qdx
depends on Px. That is, by substituting various prices of good X into this specific demand
function, we get the particular quantity of good X demanded by the individual per unit of
time at these various prices. Thus, we get the individual’s demand schedule.
In general, the individual’s demand schedule for a good is a table giving us the quantity
demanded of the good at various alternative prices of the good, keeping constant the
prices of other goods and money income and tastes of the consumer. The graphic
representation of the individual’s demand schedule gives us that person’s demand curve.
In the previous example where the demand function for an individual for good X is given
as Qdx = 32 – 4Px, if we substitute various prices of X into the demand function we will
get the individual’s demand schedule as given in Table 1.1.
Table 1.1
Px (in Rs.) 8 7 6 5 4 3 2 1 0
Qdx 0 4 8 12 16 20 24 28 32
Plotting each pair of values as a point on a graph and joining the resulting points, we get
the individual’s demand curve for good X. In Fig. 1.1 it is shown as dx
4
Figure 1.1: Linear Demand Curve
The individual buys the good X only when price falls below Rs. 8. At a price of Rs. 7 she
buys 4 units of X. As the price falls further, she purchases more of X because they are
becoming less expensive. At a price of Re1, she buys 28 units. However, even at a price
of Rs.0 she would not take more than 32 units because additional units of X may result
in a storage and disposal problem for the consumer. This is called the ‘saturation point’
for the individual. So the maximum quantity that the individual will ever demand of good
X per time period is 32 units.
In drawing the demand curve dx in fig. 1.1.1, we assume complete divisibility, so that
price and quantity demanded can both change by infinitely small steps. This enables us
to draw a demand curve by joining the points A, B, C, D... I by a continuous, smooth
line. Another point to be noted about the construction of the demand curve is that the
independent variable, price, is measured on the vertical axis, and the dependent
variable, quantity, on the horizontal axis which contradicts the mathematical principle of
drawing a curve. But this is a convention which economists follow so that they can draw
the demand curve of the consumers and the cost curves of the firms on the same set of
axes. The demand curve drawn this way is also called the inverse demand curve.
In the given example,the demand curve for the good X is a straight line and is of the
form of
Qdx = a – b Px, (1.4)
Where ‘a’ (32) is the quantity intercept and ‘–b′ (–4) is the slope, i.e.,
When we plot the demand curve, we actually plot the inverse demand curve which is
given as:
Px = α – β Qdx, (1.5)
Where is the price intercept and is the slope of the inverse demand curve
and equals
In our example, α = (32/4) = 8, is the price intercept, and –β = -(1/4), is the slope of
the inverse demand curve.
5
Though in the previous example the demand curve derived is linear, it is not always so.
Suppose Table 1.2 gives us the demand schedule for good Y for an individual.
Table 1.1.2
Py(in 90 80 70 60 50 40 30 20 10 0
Rs.)
Qdy 0 1 2 3 5 8 12 16 20 30
The demand curve that will be derived will be non-linear as shown

in fig. 1.2.
Figure 1.2: Non Linear Demand Curve
Value Addition: Know more about non-linear demand functions
A non-linear demand function may take the following form.
Qdx = a(1/Pxb), (i)
Where, a and b are positive constants.

The slope of the demand curve can be derived by taking the first derivative of the
demand function.
(dQdx/dPx) = -a.b(1/Pxb+1 ). (ii)
It is clear from the previous equation that the slope of the demand curve is negative and
it varies with the price. Hence, the demand curve that will be derived from equation-(i)
is a downward sloping, non-linear curve.
Let us suppose that a=100 and b=1. Then, the specific demand function will be,
Qdx = 100/Px (iii)

The demand schedule that can be derived from (iii) is given as follows:
6
Px (in Rs.) 0 1 2 4 5
Qdx (in units) ∞ 100 50 25 20
The demand curve derived will be non-linear. In fact, it will be a rectangular hyperbola,
i.e., it will be asymptotic to both the axes and the areas of the rectangles formed under
the curve will be equal to each other.
In the given figure, dx is a demand curve which is a rectangular hyperbola. Area of the
rectangle OP1AQ1=area of OP2BQ2=area of OP3CQ3=area of OP4DQ4=100.
The individual’s demand curve for a good represents a maximum boundary of the
individual’s intentions. For the various alternative prices of a good, the demand curve
shows the maximum quantity of the good the individual intends to purchase per unit of
time. For various alternative quantities of a good, the demand curve shows the
maximum prices the individual is willing to pay. For example, in fig.1.2 point E on the
demand curve indicates two things. First, if the price is given as Rs.50, the individual will
buy maximum 5 units of good Y Second, the maximum price that the individual will be
willing to pay to buy 5 units of Y is Rs.50.
1.1.1 (b) Movements Along vs. Shifts in Demand
When there is a change in the price of one good, other things remaining constant, the
quantity demanded of that good changes and the consumer moves along the same
demand curve. The movement along the same demand curve for a good is known as the
change in the quantity demanded the good which occurs due to a change in the own
price, ceteris paribus.
For example, in Table 1.2, when price of Y falls from say Rs.50 to Rs.40, the quantity
demanded of Y rises or expands from 5 units to 8 units and the consumer moves from
point E to point F on the same demand curve ‘dy’.
However, when any of the ‘ceteris paribus’ conditions changes holding own price of the
good constant, the entire demand curve ‘shifts’ either to the right or to the left. A
rightward shift is called an increase in demand (rather than an increase in the quantity
demanded), and this shows that at any given price of the good, the consumer buys more
of the good. Similarly, with a leftward shift the consumer buys less of the good at any
given price. This is known as a decrease in demand.
7
Reinforce your learning
Increase in Demand
Case Studie
Reducing the Quantity of Tobacco Demanded
The Government in an effort to control the spread of Oral Cancer is contemplating two
policy options to bring about reduction in tobacco (Gutka) consumption. One option is to
tax the tobacco manufacturers thereby increasing the price and thus reducing/
contracting the demand for tobacco. Alternatively the Government can make use of
public service announcements, health warnings on tobacco products, restrictions on
advertisements of tobacco products etc. These measures would shift the demand curve
of tobacco products to the left implying a decrease in the demand for tobacco products.
Shifts in the demand curve occur due to changes in income of the consumer or in the
prices of other goods or in the tastes of the consumer. When consumer’s money income
increase, while everything else remains constant, the consumer’s demand for a good
usually increases so that the consumer demands more of the good at the same price of
the good. These goods are referred to as normal goods. For example, with an increase in
the consumer’s income, the consumer’s demand for ‘mango’ may increase even though
price of ‘mango’ has not changed. This will lead to a rightward shift of the consumer’s
demand curve for mango. Similarly, a decrease in income will lead to a leftward shift of
the consumer’s demand curve.
Sometimes, with a rise in individual’s income the demand for certain goods may fall.
These goods are known as inferior goods. For example, with a rise in income consumer
may demand less of potatoes and switch over to better quality vegetables or fruits.
8
The individual’s demand curve for a good shifts when prices of other goods in the
economy changes, own price of the good remaining constant. Change in the prices of
other goods will affect the demand for the good in question significantly when these
other goods are either close substitutes or complements of the given good.
A close substitute is a good that performs essentially the same function as the original,
so that a small increase in the price of the substitute will induce the consumer to buy
more of the original good even though it’s price has not changed. Thus, the demand
curve for the original good will shift to right. For example, let us suppose that for a
consumer ‘Tropicana’ fruit juice is a close substitute of ‘Real’ fruit juice. If price of ‘Real’
increases from Rs.70 to Rs.75, then, the demand of ‘Tropicana’ will increase from 2 litres
to 3 litres a month even though its price has remained unchanged at Rs.65 per litre.
A complement is a good that is used in conjunction with the particular good in question.
For example, pizzas and coke are complements of each other. When price of pizza rises,
price of coke remaining the same, the demand for pizzas as well as coke will fall and the
demand curve for coke will shift to left.
Figure 1.3: Shift In Demand Curve
9
In fig. 1.4, d1 represents the demand curve for coke when price of one pizza was
Rs.100. At that time the consumer was consuming 5 bottles of coke at a price of
Rs.10/bottle. When price of pizza rises to Rs.150/unit, the demand curve for coke shifts
leftward to the position d2 and at the same price of coke (which is Rs.10/bottle), the
consumer reduces the demand to 3 bottles. This happens, because with an increased
price of pizza, consumption of both pizza as well as coke, falls. The opposite will happen
if price of pizza falls.
Figure 1.4: Shift In Demand Curve
1.1.1 (c) Substitutability and Narrowness of Definition
When a consumer buys a number of goods, it is possible for her to substitute other
goods for a particular good if its price rises. But the ability to substitute away from a
good increases with the narrowness of its definition. That is, the more narrowly a good is
defined; more substitutes are available for it, where as, the more broadly a good is
defined, less will be availability of its substitutes.
For example, food is a broader category than fruits and fruit is a broader category than
mango. As other goods in the individual’s consumption basket are very poor substitutes
of food, so with a rise in the price of food, the consumer will find it difficult to substitute
it with anything else. Whereas, if the good in question is fruit, then meat , milk,
10
vegetables etc. are substitutes for fruits. So a rise in the price of fruits may induce the
consumer to substitute fruits by meat or milk or vegetables. Mango is even more
narrowly defined than fruits. Because other fruits like orange, banana and apple are
more close substitutes of mango than is milk for fruits, so with a rise in the price of
mango, the consumer immediately will switch over to other fruits.
1.1.1 (d) The Market Demand for a Product
The market demand for a good gives the alternative quantities of the good demanded
per time period, at various alternative prices, by all the individuals in the market. The
market demand for a good, therefore, depends on all the factors that determine the
individual’s demand and also on the number of buyer of the good in the market.
In particular, if there are 100 identical buyers in the market for good X, having the same
demand function Qdx = 32 – 4 Px, the market demand function will be simply given by
100 Qdx, i.e.,
QDx = 100 Qdx = 3200 – 400 Px, (1.6) where QDx is the market demand function. The
market demand schedule can be derived by substituting various prices of X into this
demand function. Market demand curve will be a graphical presentation of the market
demand schedule. Table 1.3 gives us the market demand schedule and fig. 1.5 gives the
market demand curve.
Table 1.3
Px (in Rs.) 8 7 6 5 4 3 2 1 0
QDx 0 400 800 1,200 1,600 2,000 2,400 2,800 3,200
Plotting each pair of values as a point on a graph and joining the resulting points, we get
the market demand curve. In fig. 1.5 Dx gives us the market demand curve for good X.
Figure 1.5: Market Demand Curve
In practice, individuals have different preferences and so they have different demand
functions for the same good X. In this case of people having different demand curves for
the same good, we can derive the market demand curve by horizontally adding up the
individual demand curves.
11
For example, suppose there are just two individual buyers in the market for good Y
whose individual demand schedules are given as follows in Table 1.4
Table 1.4
Price Qd1y Qd2y
90 0 1
80 1 2
70 2 5
60 3 8
50 5 12
40 8 15
30 12 17
20 16 20
10 20 25
0 30 35
In fig. 1.6 we draw the individual demand curves d1y & d2y and their horizontal
summation give us the market demand curve Dy. At each price the quantities demanded
by both the buyers are summed up to give the market demand curve.
Figure 1.6: Derivation Of Markrt Demand Curve
When price is Rs.100 there is no demand for y by both the individuals. At Rs. 90,
individual 2 demands 1 unit of y but individual 1 still has zero demand,, thus the market
demand is 1 unit. At Rs.80, individual 1 demands 1 unit & 2 demands 2 units, so market
demand is 3 units. At Rs.40, 1’s demand is 8 units, 2’s demand is 15 units and so
market demand is 23 units and so on. In fig. 1.6, the market demand curve merges with
individual 2’s demand curve till point A2 and then to derive the market demand we
horizontally add up the points on the individual demand curves. For example, to derive
point B on the market demand curve we add up P8B1 and P8B2. So P8B = P8B1 + P8B2 so
that B2B is equal P8B1. Similarly, P4F = P4F1 + P4F2, such that F2F = P4F1 and so on.
12
Solved Problem
Question:
Suppose that a good is demanded by just two consumers A and B. Their demand curves
are
qa = 80-8P
qb = 40-10P
i) Derive the individual demand schedules.
ii) Plot the individual demand curves and the market demand curve on the same set of
axes.
Solution:
i) Individual Demand Schedule for A
Price (in Rs.) 0 1 2 3 4 5 6 7 8 9 10

Quantity (in units) 80 72 64 56 48 40 32 24 16 8 0
Individual Demand Schedule for B
Price (in Rs.) 0 1 2 3 4

Quantity (in units) 40 30 20 10 0
(ii)
Plotting the individual demand schedule of A and B we get the demand curves dada’ and
dbdb’ respectively. The market demand curve, daCD, is a horizontal summation of the
two individual demand curves. Its price-intercept is at Rs.10 because if the price is Rs.10
or more there is no demand by both the consumers of the good and hence, the market
demand is zero. From the demand schedule of B it is clear that for any price greater than
or equal to Rs.4, B’s demand for the good is zero. Thus the market demand curve will
merge with A’s demand curve between the price Rs.10 and Rs.4. For any price below
Rs.4 we can obtain the market demand by adding the demand by A and B both. For
example, at price Rs. 2, A’s demand is 64 units and B’s demand is 20 units and the
market demand is 64+20=84 units. In the given figure Pea+Peb=PE. At zero price the
market demand is maximum 120 units.
13
1.1.Supply
1.1.2 (a) An Individual Firm’s Supply Curve
Supply curves describe the seller’s desire to make the good available. The quantity of a
good that an individual firm is willing to supply over a specific time period is a function of
the price of the good and the cost of production. In order to derive the firm’s supply
curve of a good, we just vary the price of the good, factors influencing the cost of
production being held constant. The factors which influence cost of production are (i) the
prices of the factors of production which have helped in the production of the good, (ii)
technology and (iii) for agricultural goods, climate and weather conditions. A single firm’s
supply curve of a good shows the alternative quantities of the good that the firm is
willing to supply over a specific period of time at various alternative prices for the good,
while keeping the above constant.
In simple mathematical language this functional relationship can be expressed as follows

:
(1.7)
or, Qsx = g (Px) cet. par. (1.7´)
Where Qsx = the quantity supplied of good X by the single producer, over the specific
time period,
g = a function of,
Tech = technology,
Pi = the price of inputs,
Fn = features of nature such as climate and weather conditions.
The bar on top of the last three factors indicate that they are kept constant.
Equation (1.7) or (1.7´) is a general functional relationship. In order to derive a single
firm’s supply schedule and supply curve, we must get that firm’s specific supply function.
For example, let a single firm’s supply function for good X be
Qsx = –50 + 25 P x.
If we substitute various prices of X into the above supply function we will get the
individual supply schedule as given in Table 1.5.
Table 1.5 Individual Supply Schedule
Px (inRs.) 10 9 8 7 6 5 4 3 2
Qdy 200 175 150 125 100 75 50 25 0
Px (inRs.) 10 9 8 7 6 5 4 3 2
Qdy 200 175 150 125 100 75 50 25 0
14
Plotting each pair of values as a point on a graph and joining the

resulting points, we get the individual firm’s supply curve.
The supply schedule and the supply curve show that the producer will supply the good
only if the price is higher than Rs.2. If the price is Rs.2 or less the price is so low that it
does not even cover the cost of production so that the firm does not intend to produce
and sell the good.
In the above example the supply curve is an upward-sloping straight line. An upward
sloping supply curve implies that the higher is the price of the good, the more willing the
producer will be to supply the good. A producer’s positively sloped supply curve for a
good represents in one sense a maximum and in another sense a minimum boundary of
the producer’s intentions. At any given price, it would indicate the maximum quantity of
a good that the producer is willing to supply. To put it in a different way, if a given
quantity of a good is to be supplied, the supply curve would indicate the minimum price
at which the producer would be willing to supply that quantity. For example, let us take
the point D on the supply curve sx in fig. 1.7. That point indicates that if the price is
Rs.7, then the producer will be willing to supply a maximum of 125 units of the good. It
also indicates that if the producer has to supply 125 units of the good, then Rs.7 is the
minimum price at which he would supply that quantity.
Even though the supply curve is usually positively sloped, it could also have a zero,
infinite, or a negative slope, and no generalisation is possible. Also when the supply
curve is positively sloped it can be linear, as in the given example, or non-linear.
1.1.2 (b) Movements along, versus, Shifts in the Supply Curve

One should distinguish between movements along a supply curve and shifts of the
supply curve. When price of the good in question changes, certeris paribus, the producer
moves along the same supply curve.
When factors other than own price of the good, affecting the supply of the good change,
the entire supply curve shifts. This is referred to as a change or shift in supply as
distinguished from a change in the quantity supplied.
For example, if there is an improvement in technology, so that the cost of producing

every unit of the good falls, the supply curve shifts downward. This downward shift is
referred to as an increase in supply. It means that at the same price for the good, the
firm offers more of it for sale per time period. The same thing happens when there is a
decrease in the prices of the inputs.
Fig. 1.8 is an extension of fig.1.7. Given the supply curve sx when price rises from Rs.4
to Rs.7, the producer moves along the same supply curve sx from C to D and quantity
supplied increases from 50 to 125 units. When due to decrease in the cost of production
supply curve shifts from sx to s’x, the producer shifts from point C on sx to C’ on s’x and
increases the supply of the good from 50 to 80 units even at the same price of Rs.4.
15
Figure 1.8:Shift In Supply Curve
1.1.2 (c) The Market Supply of a Product
The market or aggregate supply of a good gives the alternative amounts of the good
supplied per time period at various alternative prices by all the producers of this good in
the market. In addition to all the factors that influence individual producer’s supply, the
market supply depends also on the number of producers of the good in the market.
If all the producers face identical cost conditions such that they have the same supply
functions then the market supply function can be derived simply by multiplying the
individual supply function by the number of producers in the market. In the previous
example, if there are 100 identical producers in the market having the supply function
Qsx = –50 + 25 Px, then the market supply function will be given by
QSx = 100 × Qsx = –5,000 + 2,500 Px
The market supply schedule will be given by Table 1.6.
Table 1.6 Market Supply Schedule

Px (in Rs.) 10 9 8 7 6 5 4 3 2
Qdy 2,000 17,500 15,000 12,500 10,000 7,500 5,000 2,500 0
The market supply curve is simply a graphical presentation of the market supply
schedule which can be drawn very much in the same way as fig. 1.7, only the scale on
the horizontal axis will have to change.
When individual producers face different cost conditions they will face different supply
functions and supply curves. In this case the market supply curve will be given by the
horizontal summation of the individual supply curves of all the firms in the market.
Let Table 1.7 give the supply schedules of the three producers of good X in the market.
16
Table 1.7
Px Quantity supplied
(in Rs.) (per time period)
Firm 1 Firm 2 Firm 3
5 15 25 30
4 12 20 25
3 5 15 18
2 0 10 12
1 0 0 5
0 0 0 0
The individual supply curves of the three firms are drawn on the same set of axes in fig.
1.9 as sx1, sx2 and sx3. The market supply curve is given by Sx (OEDCBASx) which is a
horizontal summation of sx1, sx2 & sx3. Various points on the market supply curve are
obtained by adding up the quantities supplied by the individual producers at different
price levels. For example, at price Rs.5 (or P 5) the quantity supplied by firm 1 is P5A1
(15), by firm 2, P5A2 (25) and by firm 3 it is P5A3 (30). So the total quantity supplied in
the market at P5 price is P5A1 + P5A2 + P5A3 = P5A (70 units). The market supply curve
merges with Firm 3’s supply curve till price rises from Re.0 to Re1 and after that it
becomes a horizontal sum of s1x, s2x & s3x.
Figure 1.9:Derivation Of The Market Supply Curve
1.1.3 Equilibrium
Equilibrium is said to exist when opposing forces are in balance. In the market for a
particular good, demand and supply are like two opposing forces. The market is in
equilibrium at the price where the amount that is demanded equals the amount supplied.
This price is called the equilibrium price and the quantity demanded and supplied at this
price the equilibrium quantity. Market equilibium is shown graphically in Fig.1.10.
In fig.1.10 Dx is the market demand curve and Sx the market supply curve. They
intersect at point E. Only at price OP*, the quantity demanded is equal to the quantity
supplied which is equal to OQ*. At any price higher than OP* supply exceeds demand
and any price below OP*, demand exceeds supply and they are not in balance. So the
equilibrium price is OP* and the equilibrium quantity OQ*.
17
Figure 1.10: Equilibrium
1.2.1 Elasticity of Demand

We know that the demand for a good is a function of its own price, prices of other goods,
and income of the consumer. The elasticity of demand is defined as the degree of
responsiveness of the quantity demanded of a good with respect to a change in the
variable on which the demand for the good depends. Accordingly we have own price
elasticity of demand; cross price elasticity of demand and income elasticity of demand
for a good.
1.2.1 (a) Own Price Elasticity of Demand
I. Definition and Measurement
Own Price elasticity of demand or,simply, the price elasticity of demand refers to the
relative responsiveness in the quantity demanded of a good with respect to a change in
its own pirce. The coefficient of price elasticity of demand is given by the percentage
(proportionate) change in the quantity demanded of a good divided by the percentage
(proportionate) change in its own price.
If a given percentage change in the price of a good results in a greater percentage

change in quantity demanded, then the coefficient of elasticity will be greater than one
and the demand is said to be relatively elastic. On the other hand, if a given percentage
change in the price of a good results in a smaller percentage change in quantity
demanded then the elasticity will be less than one and the demand is said to be
relatively inelastic. When a given percentage change in the price of a good results in an
equal percentage change in the quantity demanded, then elasticity is equal to one and
the demand is said to be unitary elastic. When a given percentage change in the price
results in no change in quantity demanded then the elasticity will be equal to zero and
the demand is said to be perfectly inelastic. When a slight change in price results in an
infinite change in the quantity demanded the elasticity of demand will be equal to infinity
and the demand is said to be perfectly elastic.
Since price and quantity demanded are inversely related, the coefficient of price
elasticity of demand is a negative number. In order to avoid dealing with negative
values, a minus sign is often introduced into the formula for the coefficient of price
elasticity. Thus, the formula for own price elasticity of demand for good X is given by the
following:
18
where the numerator gives the proportionate change in the quantity demanded of X and
the denominator gives the proportionate change in the price of X.
Equation (1.8)
can also be
written as
For infinitesimally small change in quantity and price the formula for price elasticity will
be
where is
the inverse of the slope of the demand curve at a point where price is Px and quantity
demanded of the good is Qx. Equation (1.10) can, therefore, be written as :
and it gives us the formula to measure elasticity

at a point on the demand curve.
To measure elasticity between two points on the demand curve we may use the formula
given by equation (1.9). But while applying this formula to measure elasticity between
two points on a demand curve we would get different results depending on whether we
move from higher price to the lower price or from the lower price to the higher one. For
example, suppose we want to measure elasticity between points D & F on the market
demand curve Dx given in fig.1.5 which is reproduced in fig. 1.11. If we let the price fall
from Rs.5 to Rs3 and move from D to F on the demand curve Dx, then elasticity will be
19
Whereas, if we let the price rise from Rs.3 to Rs.5 and move from point F to point D on
the same demand curve Dx, then elasticity will be
Thus, though we are measuring elasticity between the same pair of points on a demand
curve we are getting different results depending on whether we are moving from a
higher to a lower point or from a lower to higher point. This problem arises because the
elasticity of demand tends to vary from one point to another on the demand curve, and
for a large change in price and quantity we need an average value over the entire range.
Thus, when we deal with large changes in price and quantity, we should use the
following Arc Elasticity formula.
where P1 and P2 are the prices between which we want to find out the elasticity.
Following this formula, the elasticity between the points D and F on the demand curve
Dx in fig.1.11 will be
II. Graphical Presentation of Elasticity
Graphically the price elasticity at a point on a linear demand curve is shown by the ratio
of the segments of the line to the right and to the left of the particular point. It can also
be described as the ratio of the lower segment to upper segment. Let us look at the
linear demand curve given in fig.1.11.
Figure 1.11: Demand Curve
20
Elasticity of demand at point E on the demand curve AJ will be given by . It can be

proved as follows.
Elasticity of demand at point E is given by
Triangles AKE and ELJ are similar triangles and therefore, sides are proportionate.
It is clear from the figure that E is the mid point of the demand curve AJ. Therefore, EJ =
EA and hence
To measure elasticity at a point on a non-linear demand curve we draw a tangent to the

demand curve at that point so that it intersects the two axes. Then elasticity at that
point is given by the ratio of the segments of the tangent to the right and to the left of
the particular point.
In fig.1.12 the elasticity of the non-linear demand curve Dx at point E is given by

where AJ is a tangent drawn to the demand curve Dx at point E.
21
Figure 1.12: Non Linear Demand Curve
Elasticity at point E on the demand curve Dx is given by which is

equal to one
Arc elasticity between two points on the demand curve is equivalent to finding elasticity
at the midway between the two points. In fig.1.11, where the demand curve is a straight
line, the point midway between D and F is the point E which corresponds to the price
4 and quantity 1600 units. So arc elasticity between the two

points D and F is equal to elasticity at point E on the demand curve. In both the cases
elasticity is equal to one. Both arc elasticity and point elasticity give us the same result
here because the demand curve is a straight line.
In fig. 1.12, where the demand curve Dx is non-linear the point midway between D and
F is the point E′ which lies on the straight line joining the two points. So the arc elasticity
between the two points D and F on the non-linear demand curve Dx, is given by the
elasticity at point E′ which does not lie on the demand curve. In fig.1.12, the elasticity
corresponding to the price is given by the elasticity at point E on the demand
curve and it is given by The arc elasticity between

the two points, D and F, is given by elasticity at point E´, and it is given by
Now as DF is parllel to AJ, so the slope of the demand curve Dx

(which is equal to slope of AJ) is equal to the slope of DF. But OL′ > OL. Therefore, it is
clear that elasticity at point E´ is not equal to the elasticity at point E. Therefore, when
the demand curve is curvi-linear, the arc elasticity gives only an estimate of point
elasticity and the estimate improves as the arc becomes smaller and approaches a point
in the limit. In the fig.1.12, as points D and F on the demand curve Dx move closer to
each other, E´ approaches E, and therefore, the coefficient of elasticity at point E´ will
tend to be equal to elasticity at point E. The same thing will happen if the curvature of
the demand curve over the arc DF becomes less. Therefore, arc elasticity will give a
22
better estimate of point elasticity of demand on a curvi-linear demand curve as the

length of the arc becomes smaller and the curvature of the demand curve over the arc
becomes less.
Historical And Intellectual Context
History
Source: http://en.wikipedia.org/wiki/Price_elasticity_of_demand
Together with the concept of an economic "elasticity" coefficient, Alfred Marshall is

credited with defining PED ("elasticity of demand") in his book Principles of Economics,
published in 1890. He described it thus: "And we may say generally:— the elasticity (or
responsiveness) of demand in a market is great or small according as the amount
demanded increases much or little for a given fall in price, and diminishes much or little
for a given rise in price". He reasons this since "the only universal law as to a person's
desire for a commodity is that it diminishes... but this diminution may be slow or rapid.
If it is slow... a small fall in price will cause a comparatively large increase in his
purchases. But if it is rapid, a small fall in price will cause only a very small increase in
his purchases. In the former case... the elasticity of his wants, we may say, is great. In
the latter case... the elasticity of his demand is small." Mathematically, the Marshallian
PED was based on a point-price definition, using differential calculus to calculate
elasticities.
The illustration that accompanied Marshall's original definition of PED, the ratio of PT to
Pt
Example : Given the market demand function QDx = 3200 – 400 Px,
(i) Derive the market demand schedule.
(ii) Find elasticity when price falls from Rs.5 to Rs.4.
(iii) Find elasticity at Px = Rs.3.
Ans. (i) Market Demand Schedule
Px (in Rs.) 8 7 6 5 4 3 2 1 0
Qx (in Kgs) 0 400 800 1,200 1,600 2,000 2,400 2,800 3,200
(ii) Since the price has fallen by Re1, it is a finite change and so we use the concept of
Arc elasticity
23
(iii) Here we have to find elasticity at a point on the demand curve. So we use the point
method.
III. Elasticity of demand and slope of the demand curve

We know that elasticity at a point on the demand curve is given by
So the slope of the curve is only one of the factors that determine elasticity. The second
factor is the position of the point indicated by (P/Q), at which elasticity is evaluated.
Using this concept we can derive some important results on elasticity of demand.
(i) First, the elasticity of a down-ward-sloping straight-line demand curve varies from
infinity at the price axis to zero at the quantity axis. A straight line has a constant slope,
so its reciprocal is also constant at every point on the demand curve. So the value of
elasticity at any point will now depend on the ratio P/Q. At the price axis, Q = 0, and P/Q
is equal to infinity. Thus elasticity approaches infinity as quantity approaches zero.
In fig.1.13, the elasticity is equal to infinity at point D on the demand curve DE. As we
move down the line DE, price decreases and quantity increases steadily; thus P/Q is
falling steadily so that elasticity is also falling. At the quantity axis, that is, at point E on
the demand curve, price is zero, so the ratio P/Q equals zero and hence elasticity is
equal to zero.
This result can be interpreted in another way by using the definition of elasticity.
Elasticity refers to percentage change. Starting from point D on the demand curve, a
smallest reduction in price will increase the quantity demanded from zero to some
positive amount. Because the previously demanded quantity was zero, the increase is
infinite in percentage term. So elasticity at point D is equal to infintiy. At point E, any
increase in price from zero to a positive number is an infinite percentage increase
because the price was previously zero. Therefore elasticity at point E is equal to zero. By
using the geometrical formula for point elasticity, we can derive that elasticity at the
mid-point B on the demand curve DE, will be equal to = 1; at point A, it is equal to
>1 and at point C, it is equal to <1
24
An Example
Let us take the same demand function given in the previous example:
QDx = 3200-400Px.
Differentiating with respect to Px, we get, (dQDx/dPx) = -400.
By definition, ηxx = -(dQDx/dPx)*(Px/QDx).
At the price intercept of the demand curve, Px=8 and QDx=0, and ηxx = -(-400)*(8/0) =
∞.
At Px=6 and QDx=800, ηxx = 400*(6/800) = 3>1.
At Px=4, QDx=1600 and ηxx = 400*(4/1600) = 1. It can be observed that it is the mid
point of the given straight line demand curve.
At Px=2, QDx=2400 and ηxx = 400*(2/2400)= 1/3<1.
At the horizontal intercept, Px=0 and QDx=3200 and ηxx = 0.
Constant Price Elasticity Demand Curve
We just saw that elasticity varies along a linear demand curve. There is another form of
demand curve (which is frequently used in empirical work) on which elasticity remains
the same at each point. The functional form of the demand curve is already given in
Value Addition 1.1:
Qx = a(1/Pxb ),
where a and b are positive constants and b is the elasticity parameter.

Elasticity at any point on the demand curve is given by
ηxx = -(dQx/dPx)*(Px/Qx) = (ab/Pxb+1) *(Px/a.Px-b) .
After simplifying we get, ηxx = b.
Suppose a=100 and b=1, then Qx= 100/Px.
When Px=I, Qx=100 and ηxx= -(dQx/dPx)*(Px/Qx) = (100/Px2)*(1/100) =
(100/1)*(1/100) = 1=b.
When Px=4, Qx=25 and ηxx= (100/16)*(4/25) =1=b.
Thus the demand curve Qx=100/Px is an unit elastic demand curve. It is a rectangular
hyperbola. Such a demand curve is illustrated in Fig.1.v.1.
Suppose we assume a=100 and b=2, then Qx=100/Px 2.
If Px=2,Qx=25 and ηxx= -(dQx/dPx)*(Px/Qx) =
(200/Px3)*(2/25)=(200/8)*(2/25)=2=b.
Instead, if Px=5, Qx=4 and ηxx= (200/53)*(5/4)=(200/125)*(5/4)=2=b.
25
Thus, when b=2, the elasticity of demand is equal to two at each point on this demand
curve
(ii) Second, comparing two straight line demand curves of the same slope, the one
farther from the origin is less elastic at each price than the one closer to the origin.
In fig. 1.14, D1E1 and D2E2 are two parallel straight line demand curves. Let us take the
price P, A and B are the corresponding points on the demand curves D1E1 and D2E2
respectively. Since the two curves are parallel, is the same at points A and B. Price
is also the same. On the curve farther from the origin (D 2E2) quantity is larger (i.e., OQ2
> OQ1 ) and hence P/Q is smaller, thus elasticity is smaller.
Figure 1.14: Parallel Demand Curves
Reinfource Your Learning
Generally, elasticity is measured at a particular price and in that case, at each price,
elasticity on D2E2 will be less than the elasticity on D1E1. But if we measure elasticity at
a particular quantity, then we will get a different result. For example, elasticity at
quantity Q2, on the demand curve D1E1 is (∆Q/∆P).(CQ2/OQ2) and on the demand
curve D2E2 is (∆Q/∆P).(BQ2/OQ2). As BQ2 is more than CQ2, so, at the quantity Q2,
the demand curve D2E2 is more elastic than the demand curve D1E1 .
(iii) Third, of two intersecting straight line demand curves the steeper demand curve will
be less elastic than the flatter one at the point of intersection.
In fig. 1.15, D1 E1 and D2 E2 are two straight line demand curves intersecting at point A.
D1 E1 is steeper that D2 E2 . At the point of intersection A, P/Q is the same on the two
demand curves. On the steeper demand curve D1 E1, is larger than on the flatter
demand curve D2 E2; thus, the ratio is smaller on the steeper curve than on the
flatter curve, so that elasticity is lower.
26
Figure 1.15: Intersecting Demand Curves
Thus, if we take two intersecting straight line demand curves, the flatter demand curve
will show greater elasticity than the steeper one at a given price. But it is not always
true that a flatter demand curve will show greater elasticity than a steeper one. In fact,
if two straight line demand curves having different slopes start from the same point on
the price axis, the elasticities on the two demand curves will be the same at a given
price.
In fig. 1.16 DE1 & DE2 are two straight line demand curves starting from the same point
D on the price axis, DE1 beting the steeper one. At price P, elasticity of the demand
curve DE1 will be and of DE2, will also be Hence elasticity is the same at price P
on the two demand curves. Thus, a flatter demand curve does not necessarily signify a
greater elasticity than a steeper one.
Figure 1.16: Demand Curves Having The Same Vertical Intercept
IV. Total Expenditure and price elasticity When price of a good increases, the consumer
spends more on each unit of the good bought. At the same time she buys less units of
the good. If the price effect outweighs the quantity effect, the total expenditure on the
good rises. If the quantity effect outweighs the price effect, then total expenditure falls.
If the elasticity of demand is less than one, then a 1 per cent increase in price leads to
less than a 1 per cent decrease in quantity demanded and the price effect outweighs the
quantity effect leading to rise in the expenditure on the good.
If the elasticity exceeds one, a small increase in price causes a more than proportionate
fall in the quantity demanded, so the quantity effect dominates and total expenditure
falls. If the elasticity is equal to one, a given percentage increase in price leads to an
equal percentage fall in the quantity bought and the total expenditure remains the same.
In general, if η < 1, then the change in price and the change in total expenditure move
in the same direction; if η > 1, then the change in price and the change in total
expenditure move in the opposite directions and if η = 1, with a change in price, the
total expenditure remains the same. It is clear that, the money spent by purchasers of a
27
good is received by the sellers. The total expenditure on the good by the consumers is
thus the total revenue for the sellers. Thus the previous relationship also holds good
between elasticity and total revenue. This relationship can be formally proved as follows:
Total revenue = TR = P × Q
Differentiating TR with respect to price, we get,
So when ηxx <1, then=""> 0; that is, total revenue and price move in the same
direction. When ηxx >1, then< 0; that is, total reveune and price move in opposite
directions. When ηxx =1, then= 0; that is, with a change in price there is no change in
total revenue.
Example : Using only the total expenditure criterion, determine if the demand
schedules given in the following table are elastic, inelastic, or unitary elastic.
Price 5 4 3 2 1
(in Rs.)
Qx 120 150 200 300 600
Qy 120 160 225 350 725
Qz 120 140 175 250 475
Ans.
Price Qx TEx Qy TEy Qz TEz

(in Rs.) (in Rs.) (in Rs.) (in Rs.)
5 120 600 120 600 120 600
4 150 600 160 640 140 560
3 200 600 225 675 175 525
2 300 600 350 700 250 500
1 600 600 725 725 475 475
28
(i) For good X, as the total expenditure on the good remains the same at Rs.600, so the
demand for X is unitary elastic.
(ii) For good Y, the total expenditure on the good is rising with a fall in the price. That is,
price and total expenditure are moving in the opposite direction, so the demand is
relatively elastic.
(iii) In case of good Z, with a fall in the price total expenditure also falls. So the demand
is relatively inelastic.
Applying the Theory
Relationship Between Revenue and Elasticity

Source: http://www.uri.edu/INT1/Mic/Elast/index.elast.html
The link between elasticity and revenue may answer the following questions: Why would
Brazil, one of the world's largest producers of coffee, burn some of their coffee harvest
as a way of increasing the value of coffee exports? Why would the OPEC countries lower
production if their goal was greater income? Why does agricultural income fall in years of
a good harvest?
In the coffee example, what could we expect when Brazilian officials reduce the supply of
coffee? Coffee drinkers seem to need their coffee and they can be expected to pay
whatever they need to pay to get their coffee fix. In this situation the reduction in supply
will lead to a substantial increase in price as the demanders compete for the smaller
supply. The net effect on revenue will be positive with the increase in price ( P) more
than compensating for the decreased quantity ( Q).
The relationship between elasticity and total revenue can be explained in the following
way. Let's assume there is an increase in supply - the supply curve shifting to the right.
Total revenue is by definition equal to the price times the quantity sold (P*Q). In the
diagrams below the initial situation is described by the black supply curve (inner curve).
The revenue earned from selling the output is the areas A + B. After the increase in
supply shifts the supply curve to the right (red line), revenue equals the area B + C.
Revenue will increase as a result of the increase in supply if (area C) > (area A). In the
diagrams below we see that this happens when the demand curve is flat - when demand
is elastic. When demand is elastic, revenue will increase if we decrease the price or
increase supply. Revenue and output move in the same direction while revenue and
price move in opposite directions when demand is elastic. When demand is inelastic,
revenue will decrease if we decrease the price or increase supply. Revenue and output
move in opposite directions while revenue and price move in the same direction when
demand is inelastic.
Guidelines
We can now come up with some guidelines that tell us what to do with price or output if
our goal is to raise revenue. The general rules appear below.
Inelastic Demand | ep | < 1
 Output and revenue are negatively related: to raise revenue you would lower
output
 Price and revenue are positively related: to raise revenue you would raise price
29
Elastic Demand | ep | > 1
 Output and revenue are positively related: to raise revenue you would raise
output
 Price and revenue are negatively related: to raise revenue you would lower price
To understand the relationship between elasticity and revenue, let's look at the dilemma
faced by OPEC countries.The OPEC countries once controlled the supply of oil and they
were meeting to decide what to do about their levels of oil production. Some wanted to
raise output while others wanted to lower output. The strategy to lower output would be
most effective when:
a. income elasticity of demand was high

b. cross price elasticity was low
c. price elasticity of demand is low
d. price elasticity of supply is high
Let's begin with the basics - Revenue = P*Q. The change in revenue will depend upon
the changes in price and quantity. The decision to restrict output (decrease in Q) as a
means to increase revenue works when we have reason to believe that revenue and
output tend to move in opposite directions (Revenue increases when Quantity falls). If
we cut production, the only way that this will increase revenue is if the price rises
substantially. This will happen if we are talking about a product where price does not
have much of an effect on demand - a product where demand is inelastic.
Now let's look at the previous graph. Because demand is inelastic, the curve is steep so
the appropriate diagram is the one on the right. The original equilibrium is where the
supply curve and demand curve intersect [price = P1 and the quantity = Q1]. Total
revenue is equal to the area A + B. If the supply is increased, the supply curve shifts
out, then the new equilibrium will generate revenue equal to the area B + C. If we
compare the revenues we see that the decision to expand output will lower revenue
when demand is inelastic. In this case, if OPEC thought that demand was inelastic, the
group should agree to restrict output which is exactly what they did.
With the help of the previous graph we can explain how good news for farming can be
bad news for farmers. Generally demand curve for agricultural products is fairly inelastic,
so the appropriate diagram is the one on the right panel. When an improvement in the
farm technology or a favourable weather condition shifts the supply curve of, say, wheat,
from S to S’, price falls steeply but demand increases only slightly and total revenue
falls.
Solved Problem
Question : Suppose the price of a good is Rs.10 and its demand elasticity at this price
is 0.5. Suppose that due to a rise in its price its demand fallls by 10 percent. What is the
new price? What happens to the total expenditure on the good after a rise in its price?
Calculate the percentage change in the total expenditure.
Answer: We know that ηxx = (percentage change in quantity/ percentage change in
price).
So, (0.5) = 10/ (∆P/P) .100 =10 P/ 100.∆P =10*10/ 100*∆ P =1/ ∆ P
Or, ∆ P = 2.
So the new price is Rs. 12.
As the elasticity of demand is less than one , so total expenditure on the good will rise
with a rise in the price.
30
The percentage change in total expenditure can be written as: {(P 2Q2 - P1Q1)/ P1Q1}
*100
={ (P2Q2/ P1Q1 ) – (P1Q1 /P1Q1) }* 100
= { (P2/P1).(Q2/Q1) – 1 }* 100. (I)
Now, we know that { (Q2 – Q1 )/ Q1 }* 100 = -10.
Therefore, { ( Q2 /Q1 ) - 1 } = -1/10,
Or, (Q2/Q1 ) = 9 / 10.
Putting this value in ( I ), we get the percentage change in total expenditure as:
{ (12 /10 ) . (9 / 10 ) – 1 } * 100
={ ( 108 – 100 ) / 100 } * 100
=8.
V. Factors affecting price elasticity The size of the price elasticity of demand depends on
the following factors.
(i) First, the price elasticity of demand for a good is larger the closer and the greater are
the number of substitutes available. For example, the demand for oranges is more
elastic than the demand for salt because oranges have closer and more number of
substituts (like banana, mango, etc.) than salt. Thus, if the price of both salt and orange
rise by the same percentage decrease in the demand for orange will be more than that
for salt.
We know that the more narrowly a good is defined, the larger are the number of
substitutes available and hence elasticity of demand also will be larger. For this reason
the demand for a particular brand of a product will be more elastic than the product in
general. For example, the detergent brand ‘Surf’has many substitutes like ‘Ariel’, ‘Tide’,
‘Nirma’etc. and hence an increase in the price of Surf will induce the consumers to buy
other brands and therefore, the demand for Surf will reduce to a great extent. Whereas,
if the price of detergent powder in general increases then the demand for it will not
reduce to a great extent because close substitutes are not available for it. Thus, demand
for Surf will be much more elastic than the ‘detergent powder’ in general.
In the extreme case, if a good is defined so that it has perfect substitutes, its elasticity
of demand is infinite. For example, if a particular petrol pump charges a higher price for
petrol than the market price, then it would lose all customers, as buyers will switch over
to other petrol pumps which are selling idential products at the market price.
(ii) Second, the elasticity of demand depends on the nature of the need that the good
satisfies. In general, luxury goods are price elastic, while necessaries are price inelastic.
For example, goods like cereals, cooking gas, sugar, salt, potatoes, electricity, transport
to and from the place of work are necessities and with a rise in their price quantity
demanded will not be recduced significantly. Whereas, goods like entertainments, eating
out, holidaying, etc. are luxuries and their demand will be price elastic.
(iii) The proportion of income spent on a good is another factor determining its elasticity.
Higher the proportion more is the price elasticity of demand. Examples are durable
goods like electrical appliances, cars etc. Whereas a consumer spends a very small
proportion of her income on the purchase of goods like salt, vegetables, milk etc., and
their demand will be price inelastic.
(iv) Another factor is the time period over which the consumers adjust to a price change.
The longer the adjustment period the more will be the elasticity of demand. For
example, immediately after a rise in the price of LPG, a household may not be able to
reduce its demand for it but in the longer run it will be able to replace LPG by either
piped natural gas or electricity and hence demand for LPG will decrease. So demand for
LPG will be more elastic in the long run than in the shortrun.
31
VI. Average Revenue, Marginal Revenue and Elasticity of Demand

The market demand curve shows for each specific price the quantity of the good that
buyers will buy.
In fig.1.17, point A on the market demand curve DD′ shows that at the price of OP per
unit, the quantity demanded by the buyers or quantity sold by the sellers is OQ. Total
revenue is equal to the price per unit of the good times the quantity of the good sold.
From the stand point of sellers OP×OQ or the area of the rectangle OPAQ is the total
revenue obtainable when a price of OP per unit is charged.
Average revenue (AR) is the revenue per unit of output sold. That is,
Where Q is the quantity sold at the price P. Thus, AR is identically equal to price. In Fig.
1.17, when quantity sold is OQ, price as well as average revenue is equal to AQ which is
the height of the demand curve corresponding to OQ. So the market demand curve can
be considered as the AR curve from the point of view of the seller.
Marginal revenue (MR) is the change in total revenue attributable to a one-unit change
in output sold. In general, MR is calculated by dividing the change in TR by the change in
output. That is,
where, ΔTR is the change in TR and ΔQ is the change in output. In particular, when ΔQ =
1, MR = ΔTR. In equation (1.15) the changes are finite. For infinitesimally small change
in quantity and revenue,
or, Marginal revenue at any point on the TR curve is given by the slope of the total
revenue curve at that point.
32
Calculation of Marginal Revenue
The first two columns in Table 1.8 give the demand schedule of the good. Column 3 is
derived by multiplying columns (1) & (2), and it gives us total revenue. The change in
total revenue resulting from each additonal unit of the good sold gives the marginal
revenue which is shown by column 4. Because average revenue is identically equal to
the price of the good so column 1 also gives us the AR.
Table 1.8. Calculation of Total Revenue and Marginal Revenue

(1) Price (P) (2) Quantity (Q) (3) (4)
(in Rs.) (in units) Total Revenue Marginal Revenue (MR)
(TR) (in Rs.) (in Rs.)
9 0 0 –
8 1 8 8
7 2 14 6
6 3 18 4
5 4 20 2
4 5 20 0
3 6 18 –2
2 7 14 –4
1 8 8 –6
The information given in Table 1.8 is plotted
In fig.1.18. Panel (a) gives the total revenue curve which is drawn by plotting the points
given in columns (2) and (3), and then joining these points by straight line segments.
This is done because the data given in the table is discrete. The TR curve rises steadily
till 4 units of the good are sold, remains constant at Rs.20 between 4th and 5th unit and
then declines. Panel (b) gives the corresponding demand (AR) and marginal revenue
curves. Points on the TR and D curves are plotted at each level of output.
Figure 1.18: Derivation Of AR And MR Cuver From TR Cuver
For example corresponding to 1 unit of output sold, P or AR = Rs.8 and TR = Rs.8.

Similarly, corresponding to 2 units of output sold price = AR = Rs.7 and TR = Rs.14 etc.
But points on the marginal revenue curve are plotted at the mid point of each quantity
interval. For example marginal revenue of Rs.8 corresponds to 0.5 unit of output sold
and MR of Rs.6 corresponds to 1.5 units of quantity sold which do not tally with the data
given in Table 1.18. MR curve is drawn in this way because the example upon which the
graph is based contains discrete data. As is already mentioned, the TR curve is obtained
by connecting the various points by straight-line segments. For example, between 0 and
1 unit of output the TR curve is a straight line and hence the corresponding MR is
constant at Rs.8 between 0 and 1 unit of output. Similarly, MR is constant at Rs.6
between 1 unit and 2 units of output etc. Thus, to be exactly correct the MR curve
33
should be drawn as a step-decrcasing function rather than as a continuous function. To

compensate for this incon- sistency, the values of MR are plotted at the mid point of
each quantity interval.
In this discrete example, AR = MR at quantity 1 and price Rs. 8. In a continuous case,

the two are equal when they are infinitesimally close to the vertical axis. It should also
be clear from the table that as long as TR is rising MR is positive, when TR is declining
MR is negative and MR is equal to zero when TR is maximum. In the table TR remains
constant between 4 and 5 units of output, so in panel (b) of fig. 1.18. MR is shown to be
equal to zero corresponding to 4.5 units of output.
When demand curve is a downward sloping straight line, we can easily derive the
corresponding MR curve.
Let the inverse demand function be given by
P = a – b Q, (1.17)
Where a is the intercept and –b the slope of the demand curve.

Then TR = P×Q = aQ – bQ2 (1.18)
and MR =
From equation (1.19) we can derive two important relationships between MR and the
demand curves when the demand curve is a downward sloping straight line. The MR
curve has the same intercept as the demand curve and a slope which is twice as large in
absolute value as the slope of the demand curve. Equation (1.19) can be rewritten as
follows :
MR = a – bQ – bQ = P – bQ (1.20)
As b is a positive constant, so for any positive value of Q, MR will be less than the price.
When Q = 0, MR will be equal to the price.
In general, the marginal revenue is given by
where is the slope of the demand curve at the relevant point. When the demand
curves are negatively sloped is negative, and hence, MR is less than the price. When
the demand for a good is perfectly elastic and the demand curve is horizontal, = 0,
and hence, MR will be equal to the price.
Thus, MR curve lies below the demand curve when the latter has a negative slope. The
reason is that to sell more units the price must be lowered, not just on the last unit, but
on all previous (or intra-marginal) units as well 1. For example, in Table 1.18, to increase
34
quantity sold from, 2 to 3 units, price is reduced from Rs.7 to Rs.6 per unit. Therefore,
the MR on the 3rd unit of the good is given by the current price Rs.6 minus the Re1
reduction in price for the previous two units. So MR on the 3rd unit is given by Rs.6 –
Rs.2 = Rs.4, which is lower than the price of Rs.6.
For a given quantity, price measures the height of the demand curve. Since MR < P, so
MR curve is below the demand curve.
Solved Problem
Question: Given the demand function Qx = 100-10Px, derive the equations for TR,
AR and MR functions. On the basis of your answer derive the relationship between AR
and MR.
Answer: First, we will derive the inverse demand function Px=f(Qx).

Rearranging the demand function, we get,
Px = 10-(1/10)*Qx. (i)
Multiplying (i) by Qx, we get,
TR = Px*Qx = 10Qx-(1/10)*Qx2 (ii)
Differentiating (ii) w.r.t. Qx, we get,
MR = 10-(1/5)*Qx. (iii)
Equations (i) and (iii) give us the AR and MR functions respectively.
The AR curve has a constant slope of –(1/10), implying that it is a downward sloping
straight line. Further, it has a vertical intercept equal to 10.
From (iii) it is clear that the MR curve also has the same vertical intercept as the AR
curve (10). Its slope is –(1/5), which is twice as much as the slope of the AR curve in
absolute term.
For any quantity, AR>MR. For example, at Qx=20, AR=Px=8 and MR=10-(1/5)*20 = 6;
at Qx=50, AR=Px=5 and MR= 10-(1/5)*50 =0; at Qx=80, AR=Px=2, and MR = 10-
(1/5)*80 = -6.
This illustrates that the MR curve lies below the demand curve (i.e., the AR curve) when
the demand curve is a downward sloping straight line.
The Geometry of Marginal Revenue Determination
We can use the relationship given in equation (1.21) to construct the marginal revenue
curve corresponding to a given demand curve. This is shown in fig.1.19 where in panel
(a) the demand curve is linear and in panel (b) it is non-linear. In panel (a), we can find
marginal revenue corresponding to point E on the demand curve Dx by dropping
perpendicular EA to the vertical axis and EC to the horizontal axis. We know from
equation (1.21) that MR = P+Q Corresponding to point E, P = OA and Q = OC = AE
35
and Therefore, Q = – AD. So MR = OA – AD = Rs.6 – Rs.3 = Rs.3. This is

shown as point E′. Similarly, it can be derived that MR corresponding to point F on the
demand curve Dx is OG – GD = 4.5 – 4.5 = Rs.0. This is shown as point F′. By joining E′
and F′ with a straight line we derive the marginal revenue curve MRx corresponding to
the demand curve Dx. It should be noted that the marginal revenue curve MRx starts at
point D from where the demand curve Dx also starts and it bisects any perpendicular
drawn from the demand curve to the vertical axis. For example, AK = AE and OF′
ODx. We can prove this as follows. We know that the slope of the MR curve is twice as
much as the slope of the demand curve when the demand curve is linear.
In Fig. 1.19(a), slope of MR curve =(OD/OF’) and slope of the demand curve
=(OD/ODx).
Thus, (OD/OF’)=2(OD/ODx). So, OF’= (ODx/2).
This gives another way to derive the MR curve geometrically corresponding to a linear
demand curve.
Figure 1.19:Relationship between AR And MR Curves
To find the marginal revenue corresponding to any point on a non-linear demand curve,
we draw a tangent to the demand curve at that point and then proceed as described
above. For example, to find the MR corresponding to point E on the non-linear demand
curve D′x given in pannel (b) of fig.1.19, we draw the tangent AB and then drop
perpendicular EG to the vertical axis and EL to the horizontal axis. Following equation
(1.21), we can prove that the MR corresponding to point E will be OG – AG = Rs. 10 –
Rs.5 = Rs.5. This is shown as point E′. Similarly, corresponding to point F on the
demand curve Dx′, MR will be equal to zero which is shown as point F′. Joining points
like E′, F′ we get the marginal revenue curve MR′x corresponding to the nonlinear
demand curve D′x.
36
Marginal Revenue, Price and Elasticity
The marginal revenue is related to the price and the elasticity of demand by the
following formula
For a downward sloping straight line demand curve the relationship (1.23) is shown in
fig.1.20. On the demand curve DD’, M is the mid-point and hence η = 1 at that point.
Corre- sponding to point M on the demand curve, MR = O and the MR curve intersects
the quantity axis. For any point above M on the demand curve, η > 1 and hence MR > 0
For example, at point K on the demand curve, P = KC and MR = BC.
Figure 1.20: Relationship Between AR and MR Curves
37
For any point below M on the demand curve, η < 1, and hence MR < 0, e.g., at point L,
η < 1 and MR curve goes below the quantity axis.
For an unitary elastic demand curve, elasticity is equal to one at every point on the de-
mand curve and hence MR = 0 for every level of output. In fact, an unitary elastic
demand curve has the shape of a rectangular hyperbola and its corresponding MR curve
will merge with the horizontal axis. A rectangular hyperbola is a downward sloping
curve which is asymptotic to both the axes and the areas of the rectangles formed under
the curve are equal.
In fig. 1.21 let DD’ be a unitary elastic demand curve. Then at every point on the
demand curve total revenue remains the same. Total revenue at point A on the demand
curve is given by the area of the rectangle OP AQ . Similarly, total revenue at points B
and C on the demand curve are given respectively by the arof OP 2 BQ2 and OP3 CQ3 .
Thus area of OP1 AQ1 = area of OP2 BQ2= area of OP3 CQ3. Hence, the demand curve DD’
is a rectangular hyperbola and since MR = 0, whenever η = 1, so MR curve merges with
the quantity axis.
Figure 1.21: Unitary Elastic Demand Curve
1.2.1(b) Cross-price Elasticity of Demand
The cross-price elasticity of demand measures the relative responsiveness of quantity

demanded of a given good to changes in the price of another good. In other words, it is
the proportionate change in the quantity demanded of a good X divided by the
proportionate change in the price of another good Y. Thus,
ηxy = (1.24)
where ηxy = cross price elasticity of demand between good X and good Y,
Δ Q x = change in the quantity demanded of X,
ΔPy = change in the price of Y.
When goods X and Y are substitutes of each other, a rise in the price of Y will lead to an
increase in the demand of X and hence, > 0 and, therefore, ηxy > 0.
On the other hand, if goods X and Y are complements of each other, then a rise in the
price of Y will lead to a reduction in the quantity demanded of Y and also a reduction in
the demand of X. Thus, < 0 and hence ηxy will be negative.
38
If X and Y are not related to each other, so that a change in the price of Y does not
cause any change in the quantity demanded of X, then ηxy = 0.
It should be noted that the value of ηxy need not be equal to the value of ηyx because
the responsiveness of quantity demanded of X with respect to a change in the price of Y
need not equal the responsiveness of quantity bought of Y to a change in price of X.
If the goods X and Y are produced by two firms belonging to the same industry, then X
and Y will be substitutes of each other and their cross price elasticity will be a large
positive number. For example ‘Tropicana’ and ‘Real’ belong to the same packaged fruit
juice industry and a rise in the price of one will lead to a rise in the quantity demanded
of the other and so they will have a high positive cross price elasticity. Thus, high
positive cross elasticities among a group of commodities is frequently used to define the
boundaries of an industry. If the cross price elasticity among a group of goods equals
zero or is negligible, then the goods will belong to different industries rather than to the
same industry.
Example : Find the cross elasticity of demand between Coffee (X) and Tea (Y) and
between Coffee (X) and Milk (Z), for the data given in Table 1.9. Also interprete your
results.
Table 1.9
Good Before After
Pric Quantity Price Quantity
(Rs. unit) (units/month) (Rs./unit) (unit/month)
Tea (Y) Rs. 250/kg. 0.5 kg./month Rs.500/kg. 0.25 kg./month
Coffee (X) Rs. 500/kg. 0.2 kg./month Rs.500/kg. 0.3 kg./month
Milk (Z) Rs. 15/litre 60 litres/month Rs.18/kg. 45 litres/month
Coffee (X) Rs. 500/kg. 0.2 kg./month Rs.500/kg. 0.1 kg./month
Ans. ηxy = .
Since ηxy is positive so tea and coffee are substitutes of each other.
Now, ηxz =
Since ηxz is negative coffee and milk are complements of each other.
A Brain Teaser
Sppose that the cross price elasticity of demand for two goods is minus infinity. What
would you infer about the two googds?
39
Applying the Theory
Source:http://tutor2u.net/economics/revision-notes/as-markets-
crossprice-elasticity-of-demand.html
How can businesses make use of the concept of cross price elasticity of demand?
Pricing strategies for substitutes: If a competitor cuts the price of a rival product, firms
use estimates of cross-price elasticity to predict the effect on the quantity demanded and
total revenue of their own product. For example, two or more airlines competing with
each other on a given route will have to consider how one airline might react to its
competitor’s price change. Will many consumers switch? Will they have the capacity to
meet an expected rise in demand? Will the other firm match a price rise? Will it follow a
price fall?
Consider for example the cross-price effect that has occurred with the rapid expansion of
low-cost airlines in the European airline industry. This has been a major challenge to the
existing and well-established national air carriers, many of whom have made
adjustments to their business model and pricing strategies to cope with the increased
competition.
Pricing strategies for complementary goods: For example, popcorn, soft drinks and
cinema tickets have a high negative value for cross elasticity– they are strong
complements. Popcorn has a high mark up i.e. pop corn costs pennies to make but sells
for more than a pound. If firms have a reliable estimate for cross price elasticity of
demand they can estimate the effect, say, of a two-for-one cinema ticket offer on the
demand for popcorn. The additional profit from extra popcorn sales may more than
compensate for the lower cost of entry into the cinema.
Advertising and marketing: In highly competitive markets where brand names carry
substantial value, many businesses spend huge amounts of money every year on
persuasive advertising and marketing. There are many aims behind this, including
attempting to shift out the demand curve for a product (or product range) and also build
consumer loyalty to a brand. When consumers become habitual purchasers of a product,
the cross price elasticity of demand against rival products will decrease. This reduces the
size of the substitution effect following a price change and makes demand less sensitive
to price. The result is that firms may be able to charge a higher price, increase their total
revenue and turn consumer surplus into higher profit.
Relationship between Own–and Cross-Price Elasticities
Own-price and cross-price elasticities of demand are somewhat dependent on each

other. Let us suppose that a consumer spends her entire income on the purchase of just
two goods X and Y. If the consumer’s own-price elasticity of demand for good X is less
than one, then with a rise in the price of X, ceteris paribus, the total expenditure on X
will increase. Assuming no change in the consumer’s income, this will imply that
expenditure on good Y will decrease. As price of Y is assumed to remain constant, this
would indicate a reduction in the quantity bought of Y. So a rise in the price of X leads to
a fall in the quantity demanded of Y, implying that X and Y are complements. Thus if
own-price elasticity is less than one, then the cross price elasticity is negative. Similarly
it can be derived that if the own-price elasticity exceeds one, then the cross-price
elasticity will be positive and the two goods will be substitutes of each other.
These relationships can be extended to the case when the consumer consumes any
number of goods : If the own-price elasticity of demand for good X exceeds one, then in
some average sense, the other goods are substitutes for X. If the own-price elasticity is
less than one, then in that same sense, the other goods are complements. This
proposition can be formally derived as follows :
40
Let the consumer spend her income, I, on the purchase of n goods. Then her budget
constraint is given as :
I = P1Q1 + P2Q2 + --- + PnQn, (1.25)
where Pi is the price of good i and Qi the quantity consumed of good i.

Let there be a change in P1, prices of other goods and income of the consumer remaining
constant. Differentiating (1.25) with respect to P1, we get,
[as I & P2, ...., Pn are constants]
Dividing the equation throughout by Q1 , we get,
Multiplying and dividing the 2nd to nth terms on the R.H.S. of the previous equation by
(P1/Qj), j=2,3,...,n, we get,
where η11 is the own-price elasticity of demand for good 1; ηj1 is the cross-price
elasticity between the jth good and good 1; E 1 is the expenditure on good 1 and Ej is the
expenditure on the jth good, j = 2, ..., n.
The R.H.S. of equation (1.26) gives us the weighted sum of the cross-price elasticities
between good 1 and other goods. Equation (1.26) implies that if η 11 > 1, then this
weighted sum of cross-price elasticities will be positive indicating that on an average, the
other goods are substitutes for good 1. On the other hand, if η 11 < 1, then the weighted
sum of cross-price elasticities will be negative implying that in some average sense the
other goods are complements of good 1.
41
Applying the theory
Question: Suppose a consumer spends her entire income on the purchase of two
goods, X and Y. Suppose further that the consumer’s own price elasticity for X was more
than one. Then prove that X and Y are substitutes.
Solution: Let Px and Qx be the price and quantity bought of X respectively, and Py
and Qy be the price and quantity of good Y. Let I be the income of the consumer.
As the consumer is spending her entire income on the two given goods so,
I= Px. Qx + Py .Qy (i)
Let there be a change in the price of X, income of the consumer and price of Y remaining
constant. Thus, differentiating (i) with respect toPx we get
or,0 = 1- ηxx + ηyx,
or, ηxx – 1 = ηyx. (ii)
It is clear from equation (ii) that if ηxx >1, then ηs >0, implying that goods X and Y are
substitutes of each other.
1.2.1(c) Income Elasticity of Demand
The income elasticity of demand refers to the relative responsiveness of demand of a

good to changes in consumer’s income. In other words, it is the proportionate change in
demand divided by the proportionate change in money income of the consumer.
Symbolically,
where, ηI is the income elasticity,
ΔQx is the change in the quantity bought of good X,
ΔI the change in income,
Qx the original quantity, and I the original income.
If X is a normal good for the consumer then with a change in her income the quantity
demanded of X will change in the same direction and so > 0 and hence ηI will be
positive. If X is an inferior good then ηI will be negative. A normal good can be further
classified as a necessity if ηI is less than one and as a luxury if ηI is greater than one.
Most of the broadly defined goods such as food, fuel, housing, education, clothings etc.
are normal goods, while narrowly defined inexpensive goods such as coarse rice, jawar,
bajra, vanaspati, synthetic clothes, 555 detergent powder etc. are usually considered as
42
inferior goods. Among normal goods, food, fuel, clothings etc. are necessities while
higher education and housing are luxuries.
Reinforce you learning
It should be kept in mind that this classification of goods into normal and inferior, and
necessity and luxury is not strictly defined. In fact, the same good can be regarded as a
luxury by some individuals and as a necessity or even an inferior good by other
individuals. Even the same individual might consider a good as a luxury at a lower level
of income, as a necessity at intermediate level of income and as an inferior good at high
level of income.
Example : From the income quantity relationship given in Table 1.10, find the income
elasticity of demand between the various successive levels of income and determine over
what range of the consumer’s income the good is a luxury, a necessity, or an inferior
good for the consumer.
Table 1.10
Point A B C D E F
Income 2,000 4,000 6,000 8,000 10,000 12,000
(Rs./month)
Quantity 100 300 500 650 700 600
(Kg/month)
Ans. (i) Income elasticity between A and B
= 2. As ηI > 1, between A and B the good is a luxury.
(ii) ηI between B and C
As ηI >1, so between B & C the good is a luxury.
(iii) ηI between C and D
As ηI <1, so the good now is a necessity for the consumer.
43
(iv) ηI between D and I,
So the good is a necessity
(v) ηI between E and F,
Since ηI is negative so the good has now become an inferior good for the consumer.
It can be shown that if a consumer’s income elasticity of demand for a particular good is
greater than one, then with a rise in the consumer’s income, the proportion of income
spent on the good will increase. The opposite will take place if the ηI is less than one. If
ηI = 1, then with a rise in income, the proportion of income spent on the good will
remain the same.
Let us suppose that at income I, the individual consumes Qx units of the good at a price
of Px/unit and with a rise in income by ΔI, cetris paribus, she consumes ΔQx more units
of X.
Before the change in income the proportion of income spent on X is Pr1 = .

(1.28)
44
From (1.30) it is clear that if ηI > 1, then > 1, i.e., the proportion of income spent
on good X will increase with a rise in income. Similarly, if η I < 1, the proportion of
income spent on X will decrease and if ηI = 1, then Pr2 = Pr1, implying that proportion of
income spent on X will remain the same with a rise in income.
Learn more about Income Elasticity of Demand
A well known result involving the income elasticity of demand is that the weighted sum
of all income elasticities is equal to unity. It can be proved as follows.
Let the consumer spend all her income, I, on n goods whose prices are given as P 1, P2,
..., Pn. Let Q1, Q2,...,Qn be the quantity consumed of the n goods respectively. Then the
budget constraint can be written as:
I = P1Q1+P2Q2+...+PnQn (i)
45
Let there be a change in the income of the consumer, prices of all the goods remaining
constant. The effect can be shown by partially differentiating (i) with respect to I.
Thus, (δI/δI) = P1 (δQ1/δI) + P2 (δQ2/δI) + ...+Pn (δQn/δI).
Multiplying and dividing the previous equation by (I/Qj) j = 1,2,...,n, we get,

1=P1(δQ1/δI)*(I/Q1)*(Q1/I)+P2(δQ2/δI)*(I/Q2)*(Q2/I)+...+Pn (δQn/δI)*(I/Qn)*(Qn/I).
Rearranging the terms, we get,
1=[(δQ1/δI)*(I/Q1)]*(P1Q1/I)+[(δQ2/δI)*(I/Q2)]*(P2Q2/I)+…+[(δQn/δI)*(I/Qn)]*(PnQn/I)
Or, 1 = ηI1(E1/I) + ηI2(E2/I) +...+ ηIn(En/I),
where, ηIj is the income elasticity of jth good and Ej is the total expenditure on the jth
good, j = 1,2,...n.
or, 1 = ηI1 λ1 + ηI2 λ2 + ...+ ηIn λn,
where, λj is the proportion of income spent on the jth good. It is clear that, λ 1+λ2+...+
λn = 1.
Hence, it is proved that the weighted sum of income elasticities is equal to one where
the proportion of income spent on the respective goods serve as the weights.
An interesting implication of this result is that in a world of n commodities that a
consumer consumes, there has to be at least one normal good. In other words, all goods
cannot be inferior goods
1.2.2 ELASTICITY OF SUPPLY
Elasticity of supply measures the responsiveness of the quantity supplied of a good with
respect to a change in its own price with every thing else held constant. Algebraically,
elasticity of supply (ηs) is the proportionate (percentage) change in the quantity supplied
of a good divided by the proportionate (percentage) change in the price of the good.
Thus,
where, ΔQx is the change in quantity supplied of good X,
ΔPx the change in the price of X, Px the original quantity,

and Qx the original price.
For infinitesimally small change in price and quantity, the formula for elasticity is given
by (1.32)
where, is the inverse of the slope of the supply curve at the point where price is
given by Px and quantity supplied Qx.
46
Thus, equation (1.32) gives us the elasticity at a point on the supply curve. Just like the
elasticity of demand, the arc elasticity of supply gives us the elasticity between two
points on the supply curve and can be given by slightly modifying the formula (1.31).
Thus,
Arc elasticity
where Px1 & Qx1 are original price and quantity and Px2 & Qx2 are new price and
quantity. If the supply curve of a good is upward sloping the coefficient of elasticity of
supply will have a positive sign. The supply curve is said to be elastic if η s > 1, inelastic if
ηs > < 1, and unitary elastic if ηs > = 1. It should be noted that for a positively sloped
supply curve, an increase in the price will always lead to an increase in the total revenue
of the seller and vice-versa.
The elasticity of supply depends on the period of time allowed for adjustment. As
adjustment in supply is easier in the long run than in the short run, so supply of a good
will be more elastic in the long run than in the short run.
1.2.2 (a) When the supply curve is a positively sloped straight line crossing the price
axis, then all along the line, ηs > 1.
In fig. 1.22, SSx is a linear supply curve. Point A on the supply curve corresponds to
price P1 and quantity Q1. Elasticity of supply at point A on supply curve is given by
Figure 1.22: Supply Curve
47
Similarly, we can prove that at any other point on the supply curve η s > 1.
1.2.2 (b) When the supply curve is a positively sloped straight line passing through the
origin, then all along the line, ηs = 1.
In fig. 1.23 OSx is the supply curve. Elasticity of supply at any point A on the curve is
given by
Figure 1.23: Supply Curve
1.2.2 (c) When the supply curve is a positively sloped straight line crossing the
quantity axis then ηs < 1.
In fig. 1.24, elasticity of supply at point A on the supply curve SSx is given by
1.2.2 (d) When the supply curve is curvilinear the elasticity of supply at any point on
the curve can be determined by drawing a tangent to the curve at that point and
proceeding in the manner as we had done for a linear supply curve. If the tangent
crosses the price axis, then ηs > 1, if it crosses the origin, then ηs = 1 and if it crosses
the quantity axis then ηs < 1. ηs at point A1 on the curve given in fig. 1.25 is greater
than one, at point A2, ηs is equal to one and at point A3 it is less than one.
48
Figure 1.25: Non Linear Supply Curve
Summary
1. Demand-supply analysis is an economic model of price determination in a
market.
2. The demand schedule, depicted graphically as the demand curve, represents
the amount of some good that buyers are willing and able to purchase at various
prices, assuming all determinants of demand other than the price of the good in
question, such as income, personal tastes, the price of substitute goods, and the
price of complementary goods, remain the same. Following the law of demand,
the demand curve is almost always represented as downward-sloping, meaning
that as price decreases, consumers will buy more of the good.
3. Price of the good concerned remaining the same a change in the price of
substitutes and/complements and a change in the consumer’s income leads to a
shift in the demand curve.
4. The supply schedule, depicted graphically as the supply curve, represents
the amount of some good that producers are willing and able to produce and sell
at various prices, ceteris paribus, that is, assuming all determinants of supply
other than the price of the good in question, such as technology and the prices of
factors of production, remain the same.
5. When factors other than own price of the good, such as prices of the inputs
and/technology change, the supply curve shifts.
6. Equilibrium is arrived in a competitive market at that price which equates the
quantity demanded of a good to quantity supplied. It occurs at the intersection of
the demand and supply curves.
7. Elasticity of demand measures the responsiveness of the quantity demanded
of a good to a change in the factors which affect demand.
8. Own price elasticity of demand is given by the percentage change in the
quantity demanded of a good divided by the percentage change in its price. Arc
elasticity measures elasticity between two points on the demand curve and Point
elasticity measures elasticity at a specific point on the demand curve. Price
elasticity of demand for a good will be higher the larger the number of substitutes
available and the longer the time allowed for demand to adjust to a change in its
price.
9. Cross price elasticity is given by the percentage change in the demand of a
good divided by the percentage change in the price of some other good. In case
the two goods are substitutes cross price elasticity is greater than zero and if they
are complements it is less than zero.
10. Income elasticity is given by the percentage change in the quantity bought of a
good divided by the percentage change in the consumer’s income. For normal
goods the coefficient of income elasticity is positive and for inferior goods it is
negative.
11. Elasticity of supply is given by the percentage change in the quantity sold
of a good to a given percentage change in the price of the good.
49
Exercise
1. 1. Use supply and demand curves to illustrate how each of the following events would
affect the price of butter and the quantity of butter bought and sold: (a) an increase in
the price of its substitute margarine; (b) an increase in the price of milk; (c) a decrease
in average income levels.
1. 2. Suppose the demand curve for a good is given by Qx = 10 – 2Px, where Px is the
price of good.. Determine the own price elasticity of demand for good X at Px = Re 1 and
Px = Rs. 2.
1. 3. Explain the following statements :
a. The individual’s demand curve for a good represents a maximum boundary of the
individual’s intentions
b. A producer’s positively sloped supply curve for a good represents in one sense a
maximum and in another sense a minimum boundary of the producer’s
intentions.
1. 4. Distinguish between a “substitute” and a “complement” of a good.

What will happen to the demand for a good X when prices of both a substitute and a
complement of X rise simultaneously?
1. 5. “Availability of substitutes of a good increases with the narrowness of its

definition.” Explain. In the light of your answer explain whether demand for ‘Surf excel’,
a brand of detergent powder, more or less elastic than the demand for all detergent
powders in general.
1. 6. What will be the shape of a unitary elastic demand curve and its corresponding
marginal revenue curve? Explain giving reasons.
1. 7. “Arc elasticity gives a better estimate of point elasticity of a curvilinear demand

curve as the size of the arc becomes smaller and the curvature of the demand curve
over the arc becomes less.” Explain.
1. 8. What is cross-elasticity of demand? How can we define an industry by using cross

elasticity?
1. 9. Neena consumes two goods X and Y with a fixed income; if her cross elasticity of
demand for X with respect to price of Y is greater than zero, then we can infer that her
demand for Y is less elastic. True or false, and why?
1. 10. If the inverse demand function is p = a – bq, where a and b are positive
constants,
what is the price elasticity at q = 0,at q =(a/b) and at q =(a/2b) ?
1. 11. The price elasticity of demand for a given commodity is alleged to be greater :
i. The more numerous and closer the substitutes.

ii. If it is a luxury rather than a necessity.
iii. If it accounts for a large proportion of the consumer’s income.
50
iv. At higher prices rather than lower prices. Explain the supporting argument in
each case and analyse its validity.
1. 12. How should a linear downward sloping demand curve shift if elasticity at each
price is to remain the same? Explain using diagram.
1. 13. A consumer spends all her income on two goods X and Y. If a 50% increase in the
price of good X does not change the amount consumed of Y, what is the price elasticity
of demand for good X?
1. 14. Suppose a consumer spends her entire income (I) on purchase of ‘n’ goods whose
quantities are denoted as q1 q2 ..., qn and price as p1, p2, ..., pn. Prove that if the own-
price elasticity of demand for a particular good exceeds one, then in some average
sense, the other goods are substitutes for the given good and if the own price elasticity
is less than one, then in that same sense, the other goods are complements of the given
good.
1. 15. Using calculus prove that the total amount spent on a good varies directly with
the change in price when elasticity is less than one, and inversely with the price when
elasticity is greater than one.
1. 16. Suppose that when Sachin’s income increases (prices of all goods unchanged), he
devotes the entire increment in income to increasing his purchase of food. Is Sachin’s
income elasticity of demand for food greater than, equal to, or less than one?
1. 17. What is the price elasticity of demand supposed to measure? State the point
elasticity and arc elasticity formulas for measuring elasticity of demand. When should
each be used?
1. 18. Compare the elasticity of two straight line intersecting demand curves at the print
of intersection.
1. 19. Prove that of two parallel straight line demand curves, the one farther to the right
has a smaller price elasticity at each price.
1. 20. Does a flatter demand curve necessarily signify a greater elasticity than a steeper
one?
1. 21. Prove that the proportion of income spent on a good rises with a rise in income if
the income elasticity of demand for the good is greater than one.
1. 22. Prove the followings geometrically:
a. When the supply curve of a good is an upward sloping straight line passing
through the origin, then all along the supply curve, elasticity of supply is equal to
one.
b. When the supply curve is an upward sloping straight line crossing price axis, then
elasticity of supply is greater than one at every print on the curve.
c. When the supply curve is a positively sloped straight line intersecting the
horizontal axis, then elasticity of supply is less than one at every point on the
supply curve.
51
Glossary
Complements - Two goods which are consumed together and for which the quantity
demanded of one is negatively related to the price of the other.
Cross elasticity of demand - The responsiveness of quantity demanded of a good to a

change in the price of another good.
Demand - The quantity of a good or service which an individual or group desires at the
ruling price.
Demand curve - A graphical presentation showing the relationship between the

quantity demanded of a good and its price.
Demand function - A functional relation between quantity demanded and all of the
variables that influence it.
Demand schedule - A numerical tabulation showing the quantities that are demanded
at various alternative prices.
Equilibrium - A position of rest. When applied to markets, equilibrium denotes a

situation in which, in the aggregate, buyers and sellers are satisfied with the current
combination of prices and quantity bought or sold, and so have no incentive to change
their current actions.
Income elasticity of demand - A measure of the relative responsiveness of the

demand of any good to a change in the level of income of the person demanding the
good.
Price elasticity of demand - The relative responsiveness of the quantity demanded of

a good to a change in its own price.
Price elasticity of supply - The responsiveness of the quantity supplied of a good to a

change in its own price.
Substitutes - Two goods are substitutes of each other if they satisfy essentially the
same want of the consumer such that the quantity demanded of one is positively related
to the price of the other.
Supply - The relation between the quantity of some commodity that producers are
willing to produce and sell per period of time and the price of that commodity, ceteris
paribus.
52
Supply curve - A graphical presentation of the relationship between the supply of a

commodity and its price.
Supply function - A mathematical relation between the quantity supplied and all the
variables that influence it.
Supply schedule - A numerical tabulation showing the quantity supplied at a number of

alternative prices.
53
The Theory Of Consumer Choice
Paper : Introductory Microeconomics
Unit III - The Households
Lesson: The Theory Of Consumer Choice
Lesson Developer: Jasmin
Jawaharlal Nehru University

Table of Contents
The Theory of Consumer Choice
Learning Outcomes
Introduction
The Budget Constraint
 Slope of the budget constraint
Consumer Preferences and Indifference Curves
 Indifference Curves: Properties
 Types of Indifference Curves and their shapes
Optimization
 Changes in Income and Consumer’s Choices
 Changes in Prices and Consumer Choice
 Income and Substitution effects
 Equivalent and Compensating Variation
Demand Curve: Derivation
Application of The Theory of Consumer Choice
 Slope of the demand curve: case of a giffen good
 Wages and Labor Supply
 Interest Rates and Household Savings
Conclusion
Summary
Exercises
Glossary
References
Web-links

Learning Outcomes
This chapter aims to give the reader, a deep insight into the Theory of Consumer Choice.
The lesson deals with questions like “How does a consumer decide what to buy?”, “What are
the trade-offs faced by him while making such decisions?”, “How do the decisions change
with change in factors like price, incomes, interest rates etc.?”. After reading the chapter,
the reader should be able to understand the concepts of affordability and budget constraint,
Indifference curves and how do they depict consumer preferences, the impact of changes in
income and price on the consumer’s choice, Income and Substitution Effects. The chapter
ends with derivation of demand curve and a few applications of the Theory of Consumer
Choice. The practice questions at the end of the lesson will help in developing a better
understanding of the concepts discussed in the lesson.
Introduction
The theory of demand has its foundations in the theory of consumer choice. Analysis of
consumer behavior is a prerequisite to deal with the theory of demand. The Theory of
Consumer Choice relies on the assumption that the consumer is rational, he is equipped
with the knowledge regarding his income, commodities available and their prices, to make a
decision as to what to buy. Trade-offs faced by the consumers while making a choice,
assume an important role in the theory of consumer choice. Amount to be spent on different
commodities, given the income and the price, amount of time to be devoted to leisure and
work, whether to consume more in the present or to save more for the future are a few
important questions that a consumer encounters in his day to day life. In the due course,
we will see how the theory of consumer choice caters to these questions.
The Budget Constraint
A consumer would prefer having greater quantity or better quality of the goods he
consumes, however, his income acts as a limit on the amount of money he can spend on
consumption of those goods. It is important to understand this constraint. To take a simple
example, let’s study the case of a consumer who consumes only two commodities: Burger
and Milkshake. Suppose that the consumer earns a monthly income of Rs.1000, the price of
a burger is Rs.20 and that of a glass of milkshake is Rs.10. Table No. 1 lists several

combinations of milkshake and burger that the consumer can choose from given his income
and prices of the two goods.
Table 1: Combinations of Burger and Milkshake that the consumer can afford to consume
Glasses of Number of Spending on Spending on Total Spending
Milkshake Burgers Milkshake Burger
0 50 0 1000 1000
10 45 100 900 1000
20 40 200 800 1000
30 35 300 700 1000
40 30 400 600 1000
50 25 500 500 1000
60 20 600 400 1000
70 15 700 300 1000
80 10 800 200 1000
90 5 900 100 1000
100 0 1000 0 1000
Figure 1: Consumer's Budget Constraint
The first row in table 1 shows that if all the income is spent on burgers, the consumer will
be able to consume 50 burgers but no milkshake, however if he spends the entire income
on milkshakes, he will be able to consume 100 glasses of milkshake but no burgers. Figure
1 depicts consumer’s budget constraint. The vertical axis plots glasses of milkshake while

the horizontal axis plots number of burgers. Point A corresponds to the case where the
consumer spends all his income on burgers while at point B he consumes 100 glasses of
milkshake but no burgers. At point C consumer spends equal amount of income on burger
and milkshake. The downward sloping curve BCA shows the trade-off, the consumer faces in
consuming burger and milkshake, given income and prices. Consuming more of burgers
leaves less money with the consumer to buy milkshakes. Hence, as the consumption of one
commodity rises, the consumption of the other commodity has to fall, if the income and
prices of the commodities are kept fixed.
Slope of the Budget Constraint
Budget constraint’s slope measures the rate at which the consumer can trade one good for
the other. Slope between any two points is calculated as the ratio of change in the vertical
distance to the change in the horizontal distance. For instance if the points C and A in figure
1 are considered, the vertical distance is 50 glasses of milkshake and the horizontal distance
is 25 burgers, so the slope is 2 glasses of milkshake per burger. The slope of the budget
constraint is the same as the ratio of the prices of the two commodities. Since the price of a
burger is Rs.20 and the price of a glass of milkshake is Rs.10, the opportunity cost of a
burger is 2 glasses of milkshake. The budget constraint’s slope of 2 is the trade-off that
market offers the consumers. The consumer can trade 2 glasses of milkshake for a burger
in the market. Since the budget constraint is downward sloping, the slope is a negative
number.
Consumer Preferences and Indifference Curves

Just like the budget constraint, consumer preferences are also an important part of the
theory of consumer choice. To continue with the example of burgers and milkshake, it is the
consumer preferences that help the consumer to choose from different combinations of
these two goods. To show the consumer preferences graphically, we often use indifference
curves. An indifference curve is the locus of several bundles of consumption that give the
consumer an equal level of satisfaction. Figure 2 shows indifference curves for the consumer
who consumes burgers and milkshake. Points A, B and C on the indifference curve 𝐼1 show
various combinations of burgers and milkshake that make the consumer equally happy.
Moving from point A to B, the consumption of milkshake increases while that of burger,
falls. Same is the case when the consumer moves from B to C. The slope of the indifference
curve is termed as the marginal rate of substitution which equals the rate at which the
consumer is ready to substitute one good for the other. In this case the marginal rate of
substitution is the measure of the number of glasses of milkshake that need to be given to
the consumer for a unit reduction in consumption of burger. Indifference curve 𝐼2 , shows
greater level of satisfaction relative to the indifference curve 𝐼1 .
An indifference map gives a complete ranking of the consumer preferences, a consumption

bundle on a higher indifference curve will give a greater level of satisfaction to the
consumer relative to the consumption bundle on the lower indifference curves. If the points
A and B lie on the same indifference curve, the consumer is indifferent between the
combinations A and B and if A lies on an indifference curve higher than the indifference
curve on which B lies, the consumer prefers consumption bundle A to B.

Figure 2: Consumer's Preferences Represented by Indifference Curves
Indifference Curves: Properties
Discussed below are a few important properties of Indifference curves:
1.) Higher indifference curves carry a greater level of satisfaction compared to the lower
ones: The preference of the consumers for greater quantities gets exhibited in the
indifference curve approach also. Higher indifference curves depict bundles with
larger quantities of goods relative to the lower ones and the consumer prefers higher
indifference curves to the lower ones.
2.) Indifference curves slope downwards: In the case where a consumer likes both the
goods, when the quantity of one good is raised, the quantity of the other good has to
fall for the consumer to stay at the same level of satisfaction. This is what makes the
indifference curves slope downwards.
3.) Indifference curves do not intersect: This property can be best illustrated through a
graph. Look at figure 3, suppose points A and B lie on the same indifference curve,
also point B and C lie on the same indifference curve. This implies that the consumer
is equally satisfied at points A and B and the same applies to points B and C as well.
This would imply that the consumer is indifferent between points A and C, which is
not possible because point C has greater amount of both the goods. We reach a
contradiction, indifference curves cannot cross.

Figure 3: Indifference Curves Cannot Intersect
4.) Indifference curves are bowed inwards: As we know the slope of an indifference
curve is equal to the marginal rate of substitution which depends on the amount of
the two goods that the consumer is consuming presently. People are willing to give
more of that commodity which they possess in greater quantity and are less willing
to give up on the one which is held in meagre amounts. If the consumer has a lot of
glasses of milkshake and small number of burgers, he will be willing to give up more
number of glasses of milkshakes for every single unit of increase in the number of
burgers. However as he continues to have more and more burgers, the number of
glasses of milkshakes that he gives up for every burger will reduce. This explains
why indifference curves are bowed inwards. As illustrated by figure 4, at point A, the
consumer has a lot of milkshake but less number of burgers, at this point it would be
required to give a lot of glasses of milkshake to the consumer to make him give up
one burger. At point B on the other hand, the consumer has a lot of burgers but less
milkshake, so the consumer will be willing to give up a burger for a few glasses of
milkshake. The marginal rate of substitution at point A is 5 glasses of milkshake for a
burger while the marginal rate of substitution at point B is 1 glass of milkshake for a
burger.

Figure 4: Indifference Curves are Bowed Inwards
Types of Indifference Curves and their shapes
Different kind of preferences can be shown by different types of indifference curves:
1.) Perfect Substitutes: perfect substitutes are shown by straight line indifference
curves, the slope along these straight lines stays constant which means that the rate
at which one good can be exchanged for the other is constant. For instance a pack of
20 black paperclips can be perfectly substituted for a pack of 20 green paper clips for
a person who does not have any color preference.
Figure 5: Perfect Substitutes
2.) Perfect Complements: When the two goods are perfect complements, the
indifference curves to represent such preferences are L-shaped or right angled. A
good example of perfect complements is pair of shoes. A bundle of 5 left shoes and 7
right shoes yields 5 pair of shoes.

Figure 6: Perfect Complements
3.) Good with zero utility: In case the consumer gets 0 satisfaction out of one good, he
will consume the other good which gives him positive utility and would not be willing
to sacrifice any amount of the other good for the one that offers no satisfaction. For
example egg cannot offer any satisfaction to a vegetarian.
Figure 7: Good with Zero Utility
4.) A Necessity: There are certain commodities that are absolute necessity, there might
be a minimum quantity of such goods which is necessary for living. The indifference
curve in such a case becomes steeper as the consumption of the absolute necessity
falls towards the minimum quantity for sustenance.

Figure 8: A Necessity
5.) Good that offers negative utility beyond a particular level of consumption: Beyond a
particular point of consumption, if a consumer consumes or is forced to consume
more of a particular good, he would start getting negative utilities out of further
consumption. In such a case the indifference curve becomes positively sloped
beyond that point of consumption. If the extra units can be disposed off without
incurring any costs the indifference curves will become horizontal.
Figure 9: Good that Offers Negative Utility Beyond a Particular Level of Consumption
6.) A good that is not consumed: When a consumer in an equilibrium condition does not
consume any amount of one good, it is called a corner solution. In this case the
indifference curve cuts the axis of the good which is not consumed. The slope of the
indifference curve is flatter than the budget line.

Figure 10: A Good that is not Consumed
Optimization
Optimization involves two important components: first being the consumer’s budget
constraint and the second, consumer’s preferences. The consumer’s optimum can be
explained graphically. As shown in figure 11, the optimum is reached when the budget
constraint is tangential to the indifference curve i.e. point C. At point B, the consumer is at
a lower indifference curve, however given the budget constraint the consumer can afford to
move to a higher level of satisfaction. Point A is not affordable for the consumer. At the
optimum, the slope of the indifference curve is equal to the slope of the budget constraint
i.e. the marginal rate of substitution is the same as the relative prices. At this point the
market valuation of the goods is equal to the value that consumers place on two goods.
Figure 11: Consumer's Optimum
Changes in income and consumer’s choices
A change in income has important effects on the consumer’s choice. In case, the income of
the consumer changes, since there is no effect on the price of the two goods, the slope of
the budget constraint doesn’t change. However, due to a change in the income, the budget
constraint will shift outward or inward parallely, depending on whether there is a rise or a

fall in the income. On the new budget constraint, the consumer can afford to reach a higher
indifference curve with a better consumption bundle. Depending on the consumer
preferences, the consumer can consume at any point on the new budget constraint where
the indifference curve is tangential to it. If the consumption of a good rises with a rise in the
income, it is called a normal good. However, if the consumer decreases his consumption of
a good as the income rises, the good is said to be an inferior good. We can illustrate this
graphically. Graph A in figure 12 shows that as the income rises the consumer raises his
consumption of both milkshakes and burgers, so both these goods are normal. However in
graph B, the consumption of burgers rises while that of milkshakes falls, depicting that
milkshake is an inferior good.
Figure 12: Changes in Income and Consumer's Choices
Changes in price and consumer choices
Now we consider the impact of a change in price. Suppose the price of a burger falls from
Rs.20 to Rs.10. The price of a glass of milkshake and income of the consumer stays the
same. So the slope of the budget constraint goes down from 2 milkshake for a burger to 1
milkshake for a burger, suggesting that the budget constraint pivots and becomes relatively
flatter. If the consumer spends all his money on burgers he will be able to consume 100
burgers. The new budget constraint is AD now. The new point of consumption again
depends on the consumer preferences. As the figure 13 shows, at the new optimum, the
consumer is having more of burgers and less of milkshake.

Figure 13: Changes in Price and Consumer Choices
Income and Substitution effects
The price effect can be segregated into income and substitution effect. If after the price
change an adjustment is made, such that the consumer is left with the level of income that
leaves him with the same level of satisfaction (original indifference curve) as before the
price change but the consumer faces new relative prices, then the consumer’s response in
terms of quantity demanded is termed as substitution effect. However, if the money income
is restored and the consumer moves on to a higher or a lower indifference curve, the
response of the consumer is then called income effect. In the figure 14, graph A we can see
that as the price of burger falls the consumer moves from point A to point C. This change
can be broken down to two important steps. In the graph that illustrates the case of a fall
in the price of a burger, when the consumer moves from point A to point B which is on the
same indifference curve as point A, he faces new set of relative prices, this bit is termed as
substitution effect. Once the consumer shifts to the new indifference curve at point C, he
still faces the new set of relative prices (as at point B), this bit is called the income effect.
The substitution effect therefore is shown by rotating the budget constraint around the
original fixed indifference curve while the income effect is shown by a parallel shift in the
budget constraint. Movement from point A to point B is only about a change in the relative
prices, there is no change in the level of satisfaction. On the other hand, the movement
from point B to point C involves a change in the level of satisfaction and no change in the
relative prices. The substitution effect always works in the same direction, which means that
if the relative price of a commodity falls more of that commodity is consumed. However,
income effect can work in any direction: more or less of a good can be consumed when it’s
relative price falls depending on whether the good is normal or inferior. In case the good is
normal, any increase in the real income due to a fall in price will lead to an increase in the
consumption of that good. The income effect and substitution effect both work in the same
direction. In such a case the demand curve is negatively sloping. If on the other hand the
good is of inferior nature: less of a good is consumed when the real income rises due to the
price rise. The substitution effect works in the same direction suggesting that the quantity
consumed of the commodity should rise when its relative price falls. However, the end
result completely depends on the intensity of these two effects. If the substitution effect
outweighs the negative income effect such that the quantity demanded increases when the

relative price of the good falls, it can be defined as a case of inferior good even though the
demand curve has a negative slope. However if the negative income effect outweighs the
substitution effect, one reaches a positively sloped demand curve. This is the case of a
giffen good. The giffen goods are inferior goods and the negative income effect in their case
is strong enough to out power the substitution effect.
Figure 14: Income and Substitution Effects
Equivalent and Compensating Variation
Two very important concepts attached with the income effect of a price change are
equivalent variation and compensating variation.
Equivalent Variation: it is equivalent to giving some money income to the consumer instead
of a price change, such that he becomes as satisfied as he is after the price change. This
can be shown graphically by shifting the original budget constraint in a parallel manner such
that it touches the new indifference curve after the price change.

Compensating Variation: it is the amount of income that needs to be taken away from a
consumer when the price of a good falls to make him return to a level of satisfaction at
which he was before the price change i.e. the original indifference curve. In the graph this
magnitude can be shown by the vertical distance OL.
Figure 15: Equivalent and Compensating Variation
Demand Curve: Derivation

A demand curve can be seen as a locus of various optimum consumption points that a
consumer chooses. Suppose when the price of burger falls from Rs.20 to Rs10 the consumer
shifts from point A where he is consuming 10 burgers and 80 glasses of milkshake to point
B where he consumer 50 burgers and 50 glasses of milkshakes. Now, we get two points on
the demand curve for burger. Figure 16 shows the consumer’s optimum and the demand
curve for burger.

Figure 16: Derivation of Demand Curve
Applications of the theory of consumer choice

The theory of consumer choice has many important applications and we will discuss a few in
this section of the lesson.
Slope of the demand curve: Case of Giffen Goods
The law of demand says that as the price of the good rises the quantity demanded of it falls,
this is shown by a regular downward sloping demand curve. However, there are cases
where the law of demand gets violated. In the case of giffen goods, the demand curve is an
upward sloping one. Let’s take the example of a consumer in china who consumes rice and
chicken. As the graph shows the consumer was consuming at point C initially. Now, there is
a rise in the price of rice, the relevant budget constraint is DA. The consumer consumes at
the new point E, where he has more of rice and less of chicken. This happens because rice is
a giffen good for a consumer in china. As the price of rice rises, the consumer gets poor in a
relative sense. The income effect says that the consumer should buy more of rice and less
of chicken, on the other hand the substitution effect would direct him to buy less of rice and
more of chicken. Since, the income effect outweighs the substitution effect the, consumer
ends up buying more of rice and less of chicken.

Figure 17: Case of a Giffen Good
Giffen Goods: Rice and Wheat in China
The notion of giffen goods was first introduced by Alfred Marshall in his book
Principles of Economics in the year 1895. The idea of giffen goods can be attributed
to Robert Giffen, who pointed out how a rise in the price of bread draws down the
income of poor families. The marginal utility of money rises for these families in
such a manner that rather than buying more of other foods they end up consuming
more of bread, which is still relatively the cheapest compared to the other foods.
Though in reality, giffen goods are rare to find, Jensen and Miller found evidence of
giffen behavior. In their study, they present data from field experiment, wherein
they have tried to gauge the response of the poor households in China to changes
in the prices of staple food items. Evidence has been found for giffen behavior in
the case of rice and wheat when there prices were subsidized.
Source: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2964162/
SS

Wages and Labor Supply
The theory of consumer choice can also be used to determine the labor supply decisions i.e.
how to decide how much time should be allocated to work and leisure. Let’s take the
example of Subhash who works at the ice cream parlor. Subhash is awake for 120 hours in
a week. He can spend this time in leisure or he can work and earn a salary of Rs.100 per
hour of work. For every hour of work Subhash can have get consumption worth Rs.100. One
hour of leisure means Subhash loses out on this consumption. The opportunity cost of one
hour of leisure is Rs.100 worth of consumption. If he works for 120 hours in a week, he
earns Rs.12000 but enjoys no leisure and if he doesn’t work at all, he earns nothing and
doesn’t consume anything but gets 120 hours of leisure. As shown in the graph the
consumer can make an optimal choice consisting of work and leisure hours.
Figure 18: Subhash's Work-Leisure Decision
Now, suppose Subhash’s salary rises from Rs.100 per hour to Rs.150 per hour, there can be
two possible outcomes. With the rise in the wage rate the budget constraint rotates from BA
to BC and becomes relatively steeper. With a higher wage rate, for every hour of leisure
foregone, the consumer enjoys higher consumption. The optimal choice depends a lot on
Subhash’s preferences. With a rise in wage rate the consumption will rise definitely but what
happens to leisure, depends on Subhash’s response. Subhash can respond to the rise in the
wage rate by enjoying either more leisure or less of it.
When the wage rate rises, the substitution effect says that since leisure is relatively
expensive now compared to consumption, Subhash should work more and hence consume
more. Income effect on the other hand says as the wage rate rises the consumer becomes
better off. The consumer gets a higher wage for all the hours that he works. Assuming that
the both leisure and consumption are normal goods, the income effect encourages Subhash
to work less and enjoy more of leisure. If the substitution effect is stronger than the income
effect, labor supply curve will be upward sloping as shown in part a of figure 19 and if the
income effect is stronger than the substitution effect the labor supply slopes backward, as
depicted in part b of the figure. In both the cases, consumption rises. Hence the labor
supply curve can slope upward or downward.

Figure 19: Income and Substitution Effects : Labor Supply Curve
Interest rates and household savings
Savings of a household depend on the interest rate. A consumer’s lifetime can be divided
into two periods, the first is the young age where he works and earns and the second is the
old age when he retires. Suppose Samir, the consumer, earns Rs.100000 in his young age
which he can use for present consumption and saving. In the old age Samir will live on the
savings that he makes in the young age. If the interest rate is 10 percent, for every rupee
saved in the young age, Samir gets to enjoy consumption worth Rs.1.10 in old age. Samir
has to find an optimal combination of consumption in the young age and consumption in the
old age, which has been shown in the figure. If he consumes all of his income today, he will
be able to enjoy Rs.100000 worth of consumption but he will starve in his old age. On the
other hand, if he consumes nothing in the present he will be able to enjoy consumption
worth Rs.110000 in his old age.

Figure 20: Interest Rates and Household Saving
Now let’s see what happens when the interest rate rises to 3O percent. Again the
substitution effect and income effect come into the picture. The consumption in the old age
will certainly rise, however the consumption in the young age will depend on income and
substitution effects. When the interest rises to 30 percent, the budget constraint rotates
outwards to become BC from BA, it becomes steeper. For every rupee saved in the young
age, Samir gets Rs.1.3 worth of consumption in his old age. The substitution effect says
that as the interest rate rises consumption in the young age becomes costly relative to
consumption in the old age. So, it would make Samir save more in the young age. The
income effect says that the rise in the interest rate makes Samir better off compared to his
original position and if consumption in both the periods is seen as normal goods, Samir
would want to consume more in both the periods thereby saving less in the young age.
Figure 21 shows both the cases, part a where consumption in the young age falls as
substitution effect overpowers the income effect and part b where the consumption in the
young age rises because the income effect is stronger than the substitution effect. Hence
the effect of changes in interest rate on the savings is ambiguous which has implications for
the tax policy.

Figure 21: Effect of Increase in the Interest Rate
Conclusion
The Theory of Consumer Choice helps us understand the important factors that contribute
to decision making at the level of a consumer. Decision to consume different quantities of
different goods, allocation of time to work and leisure, inter-temporal choices, types of
preferences are explained with the help of consumer choice theory. Indifference curves,
budget constraint and tools of optimization together form the core of consumer theory. Not
that in reality every consumer goes about carrying out these optimization exercises, but
every consumer knows that his choice is limited by his budget. Given the constraint of
income the consumer has to reach the best possible combination of goods for which he has
a preference.
The Theory of Consumer Choice provides a brilliant framework for analyzing the real world
choices and it has several application, few of which we have already gone through.

Summary
In this lesson we have learnt that:
 The concept of budget constraint is very important when it comes to understanding,

how the consumer makes optimal decisions.
 The budget constraint acts as a limit to the amount the consumer can spend on
consumption of goods.
 The second important aspect of the consumer choice theory is the consumer
preference which can be graphically depicted by indifference curves.
 There are certain important properties of the indifference curves.
 Different preference patterns can be represented by indifference curves of different
shapes, for instance perfect substitutes and perfect complements.
 The budget constraint and the indifference curves together help in finding the
optimal choice.
 Changes in income and price effect the consumer choice.
 Income and Substitution Effect determine what the final optimal choice will be.
 While substitution effect always acts in one direction, the direction of income effect
depends on whether the good is normal or inferior.
 The optimal choices obtained through the optimization exercise can help derive the
demand curve.
 There are certain important applications of the theory of consumer choice, for
instance the case of giffen goods, work-leisure decisions and inter-temporal
consumption and saving decisions.
 The theory of consumer choice is a vital concept and can be applied to understand a
few real life problems.

Exercise
Review Questions
Q.1 Draw the budget constraint for Ravi, who has a weekly income of Rs.2000. He
consumes only two goods: Sandwiches and Pepsi. The price of a sandwich is Rs.20 and that
of a glass of Pepsi is Rs.10. Determine the slope of the budget line.
Q.2 Can indifference curves intersect?
Q.3 How are the concepts of equivalent and compensating variation different?
Q.4 Draw the demand curve for a giffen good, taking hypothetical values.
Q.5 Explain the case of a backward bending labor supply curve.
Q.6 Draw the set of indifference curves for left-hand and right-hand gloves.
Q.7 Can an indifference curve be upward sloping? What are the cases when it can slope
upwards?
[Hint: Think of a good (bad) that gives negative utility like pollution or a good that starts
giving out negative utility beyond a point of consumption]
Q.8 A consumer consuming apples and oranges gets a salary raise. Illustrate, how the
consumption choice changes on the rise in income when both apple and oranges are normal
goods.
Q.9 In extension to question number 8, what would happen if apple is an inferior good?
Q.10 Explain Income and Substitution effects.
Multiple Choice Questions

Q.1 Points lying on or below the budget line:
a. Are affordable, given the income and the prices of the goods.
b. Are unaffordable, given the income and the prices of the goods.
c. Give equal level of satisfaction.
d. Indicate the bundles of goods that exhaust the total given income.
Q.2 An Indifference curve shows:
a. Several combinations of goods that give the consumer an equal level of satisfaction
b. Various combinations of goods that the consumer can afford.
c. The level of income that the consumer can use to buy goods.
d. All of the above.

Q.3 If the prices of the two goods stay constant and the income increases:
a. The budget constraint become flatter.

b. The budget constraint becomes steeper.
c. The budget constraint shifts outward, in a parallel manner.
d. The budget constraint shifts inward, in a parallel manner.
Q.4 If the prices of goods X and Y rise by the same percentage:
a. The budget constraint become flatter.

b. The budget constraint becomes steeper.
c. The budget constraint shifts outward, in a parallel manner.
d. The budget constraint shifts inward, in a parallel manner.
Q.5 A normal good is the one, the demand for which rises:
a. When the income falls.

b. When the income rises.
c. They are not related to income.
d. None of the above.
Q.6 Perfect substitutes are:
a. The goods that can perfectly replace each other or can be used perfectly in place of
each other.
b. The goods that are used in conjunction with each other.
c. Both a. and b.
Q.7 At the point of optimum:
a. The slope of the budget constraint is equal to the slope of the indifference curve.
b. The slope of the budget constraint is greater than the slope of the indifference curve.
c. The slope of the budget constraint is less than the slope of the indifference curve.
Q.8 Suppose the price of burgers falls (Burger is a normal good). Resultantly, Sam’s real
income rises, which he uses to buy greater number of burgers each week. This effect is
called:
a. Substitution effect.
b. Income effect.
c. Price effect.
Q.9 Samir consumes pizza and pepsi (both the goods are assumed to be normal), the
income and the price of pepsi stay the same, while the price of pizza rises. The example of
substitution effect in this case will be:
a. Samir buys more of pepsi and less of pizza as the price of pizza rises.
b. Samir buys more of pizza when his income rises.
c. Samir buys more of pizza as its price has risen.

Q.10 Which one of these is not the property of indifference curves:
a. Indifference curves slope downwards.

b. Indifference curves intersect each other.
c. Indifference curves are bowed inwards.
d. Higher indifference curves carry a greater level of satisfaction compared to the lower
ones
Correct Answers/Options for the Multiple Choice Questions

Question Number Option
Q.1 a
Q.2 a
Q.3 c
Q.4 d
Q.5 b
Q.6 a
Q.7 a
Q.8 b
Q.9 a
Q.10 b
Justification for the Correct Answers for Multiple Choice Questions

Answer 1. All the points lying below or on the budget line represent the combinations of
goods that the consumer can afford to consume given the income and prices of the goods.
Answer 2. An indifference curve is the locus of the bundle of goods that give the same level
of satisfaction to the consumer.
Answer 3. A rise in income does not impact the slope of the budget constraint. A rise in
income, with the prices fixed can help the consumer, consume more of both the goods, so
there is a parallel shift outwards in the budget constraint.
Answer 4. When the prices of both the goods rise by the same percentage, the relative
prices of the two goods stay constant. With the rise in prices, less of both the goods can be
afforded given the same level of income, hence the budget constraint shows a parallel shift
inwards.
Answer 5. Normal goods are the goods, the demand for which rises as the income rises.
Answer 6. Perfect substitutes are the goods that can replace each other perfectly to satisfy
the needs of the consumer.

Answer 7. The point of optimum occurs where the indifference curve is tangential to the
budget constraint. This is the point where the marginal rate of substitution is equal to the
relative prices.
Answer 8. The fall in price of burgers, other things constant, raise the real income of the
consumer. There is a rise in the purchasing power of the consumer which he can use to buy
more of burgers (burgers being normal goods).
Answer 9. When the price of pizza rises relative to that of pepsi, it becomes more expensive
to consume pizza relative to the consumption of pepsi, so the substitution effect will induce
the consumer to consume more of pepsi and less of pizza.
Answer 10. Indifference curves cannot intersect each other as it violates the principle of
transitivity.
Feedback for the Wrong Answers for Multiple Choice Questions

Answer 1. Option b is incorrect because the points lying on the budget line or below it can
be consumed given the income and the prices. It is the points above the budget line that
are unaffordable. Option c is incorrect since it is the points on the indifference curve that
show equal levels of satisfaction. Budget constraint simply shows whether a bundle is
affordable or not at the given income and prices. Option d is also incorrect since, only the
bundles lying on the budget line exhaust the total given income of the consumer, but the
points or bundles lying below the budget line require a fraction of the total income, not the
whole of it.
Answer 2. Option b is incorrect, it is the budget line that shows which all bundles of
consumption are affordable. Option c is wrong because the indifference curve shows the
preference of the consumers, the budget constraint depicts the income of the consumer.
Hence option d (All of the above) is also incorrect.
Answer 3. Options a and b are incorrect because there is no change in the prices of the
goods, since the relative price of the two goods is the same, the slope will not change.
Hence the budget constraint cannot become flatter or steeper. Option d is also incorrect
since a rise in income expands the consumption possibilities for both the goods, the budget
constraint will shift inwards in a parallel manner when the income decreases.
Answer 4. When the price of both the goods rise by the same percentage, there will be no
change in the slope of the budget constraint. The budget constraint cannot get flatter or
steeper so the options a and b are not possible. Rise in prices of both the goods means that
given the income the consumer will be able to consume less of both the commodities, hence
the budget cannot shift outwards in a parallel manner. So option c is also wrong.
Answer 5. A normal good is usually defined with respect to income. It is the good, the
demand for which rises when a consumer’s income rises. So option a is incorrect, since the
demand for the normal good will fall as the income falls. Option c is wrong, the demand for
normal goods is related to income. Option d is ruled out since answer is option b.

Answer 6. Option b is incorrect, perfect complements are the goods that are used in
conjunction with each other. Options c and d are also incorrect.
Answer 7. At the optimum level the slope of the indifference curve and the budget
constraint is equal. If these slopes are not equal there is a scope that the consumer can still
do better and reach the position of optimum. At the optimum, consumer is highly satisfied
given his income, since the rate at which he is willing to trade one good for the other is
equal to the trade-off for the two goods set by the market i.e. the ratio of prices of the two
goods. So, the options b, c and d are incorrect.
Answer 8. Option a is incorrect since the substitution effect will make him consume more of
burgers since after a fall in price, the burgers become relatively cheap, not because his
purchasing power has risen. Option c is incorrect since the price effect is the overall effect of
the price change which includes income and substitution effect, the result of the price effect
will depend on the substitution an income effects. Option d is also ruled out. The correct
answer is income effect.
Answer 9. Option b is incorrect since the rise in income does not involve any substitution
effect. Option c is incorrect pizza is a normal good, when the price of pizza rises both the
substitution and income effects will make the consumer consume less of it and not more of
it. Option d is ruled out since option a is correct.
Answer 10. The options a, c and d are incorrect because they are the properties of
indifference curves. Option b is correct because indifference curves do not intersect.

Glossary
Trade-off: To give up one thing for another, which might be of more or less equal value to
the decision maker.
Rational: A behavior based on logical reasoning, taking into account all the information
available, without any inconsistencies.
Constraint: A factor that serves as a limit or a restriction, thereby impacting economic

behavior.
Slope: It is a measure that gives out the rate at which one variable changes for a unit
change in the other.
Relative: When something is measure in comparison to something else.
Utility: Utility is the satisfaction one gets out of consuming a good or service.
Substitutes: Goods that can replace each other to satisfy the needs of the consumer.
Complements: Goods which are usually used in conjunction with each other to satisfy
various uses.
Necessity: A good, the consumption of which is necessary for survival. The proportion of
expenditure on such goods falls with a rise in income.
Optimization: Making a choice which is cost effective or which delivers the best result, given
the constraints.
Inter-temporal: This term refers to the decisions made in the present and the future.
Decisions regarding consumption and savings made in the present have an impact on the
alternatives available in the future.
Normal good: These are the goods, the demand for which rises as the income rises.
Inferior good: The goods for which the demand falls as the income rises.
Giffen good: These are rare type of goods, for which the demand rises as the price rises.
They violate the law of demand.

References
Mankiw, N.G. (2007), “Principles of Microeconomics”, Ch.21, Cengage Learning
Lipsey, Richard & Chrystal, Alec (2011), “Economics”, Ch.5, (PP.91, 97-99), Oxford
University Press
Web Link
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2964162/

Long- Run Costs and Output Decisions
Semester-I
Principle of Economics
Unit V- Cost and Production
Lesson: Long Run Costs and Output Decisions
Lesson Developer: Pankaj Khandelwal
College /Dept: Research Scholar

Table of Contents
Learning Outcomes
Short Run Conditions and Long Run Directions
 Maximizing Profits
 Minimizing Losses
The Short Run Industry Supply Curve
Long Run Directions: A Review
Long Run Costs: Economies and Diseconomies of Scale
Long Run Adjustments to Short Run Conditions
Output Markets: A Final Word
Conclusion
Summary
Exercises
References
Appendix

Learning Outcome
In this chapter we are going to learn about how the firm in the industry takes decision
related to Cost and Output from the short run to the long run. We will learn whether the
equilibrium condition in the long run is same as in the short run for all the firms i.e. those
firms who are earning profit, those who are trying to reduce loses and those who decide to
shut down in short run If not, then what is the condition of equilibrium in the competitive
market in the long run? Can new firms enter or exit or can the existing firm expand in the
long run? After reading this chapter one would be able to answer questions about
economies and diseconomies of scale and how they arise. What is the shape of the short run
Industry supply curve in the long run? What do you understand by Return to Scale?
Short–Run Conditions and Long-Run Directions

The firms are operating in the economy to earn economic profits. Profits are defined as the
difference between total revenue (TR) and total cost (TC). A normal rate of return is a part
of the total cost (TC). It is that rate which makes the current investor feels secured about
their investments and it serves as the benchmark of economic profits. For instance if a firm
is earning more than this rate then they are making profits, if a firm is earning below this
level then they are making loses and in case if a firm is earning equal to the this rate or
zero profits, then that firm is known to be breaking even. In a competitive industry, if a firm
is earning positive profits then it will attract new firms and new investors and in case a firm
is suffering loses, no investor will be there and old will run away. In case of break-even
firms, investors neither invest nor will run away.
Maximizing Profits
Assuming the industry is competitive. We can say that the profitable firms will try to
maximize their profits in the short run.
Profits = Total Revenue- Total cost
where
Total Revenue = Price x Quantity
Total Cost = Total Fixed Cost + Total Variable Cost

Let us construct a hypothetical situation in which a firm is supposed to purchase a land for
production costing INR 10,00,000. Suppose investor expects to earn a minimum return of
say 10% p.a.. That means a firm have to pay INR 1,00,000 as nominal rate of return and
thus a part of fixed cost. It sells a good at INR 50. Variable cost to a firm includes wages
and raw material amounting INR 16,000. Some other fixed cost equal to INR 10,000. As
the firm are competitive in nature, they are ready to sell all it wants at one single price INR
50. Assuming firm is supplying say 8000 unit of goods.
So, its Total Revenue is INR 400,000.
Total Cost = 160,000 + 110,000= INR 270,000.
So, Total Profit= TR-TC= INR 130,000.
In figure 1, a) shows the competitive industry and b) shows a representative firm. Market is
clearing at price INR 50 and we assume that a firm can sell anything at that price but is
constrained by its capacity in the short run. Assume that the representative firm produces
3000 unit of output. We know that in a competitive market, a profit maximizing firm
produces up to the point where price equals Marginal cost. In the short run, MC curve
moves upward because fixed factor constrains the capacity. Total revenue is equal to the
area (TP0q). Total cost is equal to the area (AC0q). Profit is simply the difference between
TR and TC. That is equal to the area (ABPC). This firm is earning positive profits.
Minimizing Losses

A firm is suffering loss if it is not earning positive profits or breaking even. These firm falls
in the following two categories:
a) those thinking of shutting down their operation immediately and bear loses equal to
the fixed cost as that would help them to minimize their loses.
b) Those that keep doing their business in the short run to minimize their losses.
Fixed cost (FC) need to be paid whether you are in the business or shutting down, that’s
why firms cannot completely exit the market in the short run. To keep continuing their
operation every firm have to look whether it’s advantageous or not. As fixed cost needs to
be paid, their decision depends on variable cost and the revenue earned. So, in the short
run a firm will keep operating till revenue earned are more than variable cost.
Operating profit (or loss) is the difference of total revenue (TR) and total variable cost
(TVC).
If, TR>TVC then firm will keep operating as the operating profit helps offsetting fixed cost
and thus reducing losses. If TR<TVC, then firm will be better off shutting down rather than
increasing losses. As now, total losses will be more than fixed cost.
Suppose that the above mentioned firm now selling at INR 30 due to competitive forces.
This means that the total revenue (TR) now will be equal to INR 240,000 (30 X 8000).
Variable cost remains at INR 160,000 and total fixed cost (FC) at INR 110,000. So, total
cost equals to INR 270,000. Thus firm is bearing losses equals to INR 30,000. Now in the
short run, the firm has to decide whether to remain in business or shut down. If they plan
to shut down, it has to bear no variable cost but just bear loss equal to its fixed cost of INR
110,000. In case they decide to remain in business, then they will make operating profit of
INR 80,000 which can be used to offset its total fixed cost and thus reduces losses from
INR 110,000 to INR 30,000. Thus the firm total loss will be only INR 30,000.
In figure 2 we will show a) the industry and b) a representative firm suffering losses but
showing an operating profit in the short run. Assume that the market price determined by
the supply and demand is INR 35. Again as the firm is operating in the competitive world,
they will operate where price (MR) is equal to the Marginal cost (MC) and produce

output (q*) equals to 5000 unit. Total revenue is the product of price and quantity thus
equals INR 175,000. Total cost is the product of Average total cost (ATC) and the quantity
produced. Average total cost and average variable cost at q* is INR 41 and INR 30
respectively. So, total cost is INR 205,000. Thus firm is suffering losses equal to INR
30,000. But firm is earning operating profit equal to INR 25,000.
We know that average fixed cost (AFC) is the difference between Average total cost (ATC)
and Average variable cost (AVC). So, AFC is INR 11. Thus, TFC= 11 x 5000= INR 55,000.
Now, if the firm shuts down it will bear losses equal to INR 55,000. By operating, they will
reduce this loss to INR 30,000 as they will be earning operating profits.
NOTE: As long as P> AVC, the firm will be better operating than shutting down in the short
run.
Shutting Down to minimize losses

When a firm cannot earn as much revenue as to cover even the variable cost, then it’s
advantageous to shut down in the short run. Shut down point is the lowest point on the
Average variable cost curve. When price falls below the minimum point on AVC, there will
be operating losses at any possible level of output the firm could choose to operate and it
would be good for the firm to stop operation and bear cost equals to the fixed cost. At all
prices above the minimum point on AVC, MC curve shows the profit maximizing output level
in the short run.
In figure 3, we show the short run supply curve of a perfectly competitive firm and the shut-
down point. The short run supply curve is the portion of its MC curve that lies above its AVC
curve.

The Short Run Industry Supply Curve
The short run industry supply curve is the sum of the individual firm MC curve (above AVC)
of all the firms in an industry. Industry supply curve is the horizontal summation of the
quantity supplied by the individual firms in the industry at each price level.
In Figure 4, shows the industry supply curve in the short run as a horizontal sum of the MC
curves (above AVC) of all the firms in the industry.
The short run Industry Supply curve can shit because of two reasons. One if the price of
other input changes which shifts the individual firms MC curves simultaneously.
With the change in the number of firms in the industry in the long run, the industry supply
curve also shifts with the increase or decrease in the number of supply curves of individual
firms. If the number of individual firm increases the industry supply curve will shift to right
and if the number of individual firm decreases the industry supply curve will shift to left.

Long Run Directions: A review

In the short run a firm will produce up to the point where P=MC. New firms will enter and
old firms will expand in the long run if there are operating profits in the industry. In case
firms suffering losses, it will produce if and only if there revenue is more than the AVC
otherwise it will shut down in the short run and will bear cost equal to FC.
In the short run, firm’s decision to operate depends on the shapes of its cost curves and
the market price of its product. In the short run, costs are determined by the present scale
of fixed factor whereas in the long run firms have to choose among many potential scale of
fixed factor.
An Individual firm long run decision depends on what their cost are likely to be different at
different scale of production. To arrive at long-run cost, firms must also compare their costs
at different scales of plant. With the increases in the scale of production there may be
Economies of scale which will help reduce production costs and help firm expands or
perhaps as the complexities arises as the firms become large in size.

Long-Run Costs: Economies and Diseconomies of Scale

In the long run, there is no fixed factor of production. So, there is no diminishing return to
other factors as output increases. Firms can choose any scale of production. Firms can
increase, decrease scale of production or can even get out of the business.
The shape of a firm’s long run AVC curve depends on how costs vary with scale of
productions. The cost may increase, decrease or remain constant with the change in the
scale of production.
Return to scale is the relationship between inputs and output. When AVC reduces with the
increase in the scale of production, we say that there increasing returns to scale. When
AVC does not change with the change in the scale of production, we say that there is
constant return to scale. When AVC increases with the increase in the scale of
production, we say that there are Decreasing returns to scale. Let us discuss all of them.
Increasing Return to Scale

When we say that a production function exhibits increasing return to scale, we mean that a
given a percentage change in inputs will leads to more than a percentage change in output.
For instance, doubling of inputs leads to more than double the quantity of output. In terms
of cost, Increasing return to scale (IRS) mean that Average cost falls as the output level
increases. Economies of scale basically mean a fall in the AC from large scale production of
output.
Technological change is one of the most important sources leading to economies of scale.
When a firm adopt capital intensive technique say in producing electronic gadgets, the cost
per unit falls when the electronic gadget are produced using machines than by using labour
only. Other source of economies of scale other than technology is sheer size. For example
the bulk order of inputs reduces the cost of inputs for large companies.
In figure 5, we will see a firm exhibiting Economies of scale. It shows short run and long run
AC curve for a firm at different level of output. Long run average cost curve (LRAC) shows
the different scales on which firm choose to operate in the long run. At any time, the scale

of operation determines the short run cost curves. LRAC is also known as envelope of the
short run average cost curves because its wraps around short run cost curves. Each point
on LRAC curve represent the least cost associated with that level of production.
In the short run, the firm chooses a particular scale of production which fixed them into one
cost curve where as in the long run, the scale of production can vary and with it the cost
curves also vary.
Constant Returns to Scale (CRS)

Constant return means the relationship between input and output stays constant. If we
double input level output level also doubles. In terms of cost, as the level of output rises the
AC doesn’t change with scale. In other words, CRS mean that the firm’s LRAC curve is flat.
The firm in figure 5 exhibits CRS between scale 2 and scale 3. The AC of production
remains constant.
There is an argument that most industries exhibit CRS after some level of output. Firms
usually opt for cost saving technology and produce at most optimal scale. After achieving an
optimal scale, a firm that wants to grow will try to do so by building another i.e. firms facing
CRS can grow after achieving optimal scale only if they will be able to replicate their existing
plants.
Decreasing Returns to Scale (DRS)

When doubling of inputs lead to less than doubling of the output. In terms of cost, with the
increase in the scale of production, AC increases in case of decreasing return to scale. For
instance when a corporation become very large the managerial inefficiencies crops up and
thus it becomes very difficult to control. As the firm size increases, it finds itself facing
problems with labour organization.
In figure 6, we will describe a firm that exhibits both the economies and diseconomies of
scale. AC decrease with the size of plant up to q* and AC increases when the size of plant
increases after q*.
All short run AC curves are U shaped, because we assume a fixed scale of plant that
constrain production and MC rises when diminishing return sets in. In the long run, there is
no fixed factor and the scale of plants can be changed.
Note: the same firm can face diminishing return in the short run and still exhibits
economies of scale in the long run.
Long run average cost curve (LRAC) can be of different shape. Its shape depends on how
cost changes with the change in the scale of production. Every firm try to take advantage of
economies of scale and avoids diseconomies of scale and thus try to operate at the optimal
scale of plant i.e. try to minimize the AC.

Long Run Adjustments to Short Run Conditions

In the long run, if the firms have an incentive to enter or exit in the industry then that
industry cannot be at equilibrium. When there are profit opportunities, new firms will enter
and when there are operating losses, firms will exit. In both the cases industry will not be in
equilibrium and firm will change their behaviour. What can be the actual adjustment in the
long run when there short run profits and losses? To know this lets analyse both the cases.
Short-Run Profit: Expansion to Equilibrium

We assume a perfectly competitive industry in which firms are earning positive profits. All
firms using same technology of production. Each firm has a LRAC curve that is U-shaped
because there are some economies of scale to be realized in the industry and then at some
scale of operations, all firms start running into diseconomies of scale.
In figure 7, we will see how a competitive firm expand in the long run when increasing
return to scale is available. When individual firm are earning economic profits at some
market price that will lead to the entry of new firms or expansion of old firms as long as
they are enjoying economic profits and economies of scale exist. This will continue
happening till prices fall.
Both the entrance of new firms and expansion of old firms will lead to the shift of the short
run industry supply curve from S to S’. As the industry supply curve is nothing but the sum
of all the MC curves of all the firms, it will shift because of the two reasons. Firstly, new
firms are being added, so their MC curves will also be added. Secondly, existing firms are
expanding; their individual MC curves will also shift to the right. Finally, each firm in the
competitive industry will choose to operate at optimal scale of operation in the long run.

In the long run, profits are driven to zero. And Equilibrium condition will be where
P*= SRMC=SRAC=LRAC
where SRMC= short run marginal cost
SRAC= short run average cost
LRAC= long run average cost
When prices are above P*, there will still exist profits and new firm will enter and when the
prices are below P*, that means operating losses, existing firm will exit. Only ay P=P* there
will be equilibrium in the industry.
Short-Run Losses: Contraction to Equilibrium

In the short run, firms suffering losses cannot exit the industry but they can do so in the
long run. Firms suffering losses either shut down and bear losses equal to fixed cost or try
to minimize losses in the short run by continue producing. In figure 8, we will see long run
contraction and exit of firms suffering short run losses in the industry. As long as losses are
being sustained in the market, firms will keep exiting and thus reducing the supply and
shifting the supply curve to the left. This will lead to the rise in the price level which will
reduce the losses of the firms remaining in the industry.

The final long run competitive equilibrium condition will remain the same:
P*=SRAC=SRMC=LRAC
In the long run profits will be zero and at this point firms will be operating at the most
optimal scale.
The long Run Adjustment Mechanism: Investment Flows towards

Profit Opportunities
In efficient markets profit opportunities get quickly eliminated as they appear and
Investment capital flows towards profit opportunities. When firms are generating profits,
capital flows in the form of investment by new firms and old expanding firms and output
expands and when they are incurring losses capital flows out in the form of disinvestment
and firms’ contract and some go out of the industry. It continues happening till the long run
competitive equilibrium condition is achieved and profits are driven to zero.
Output Markets: A final Word

When the demand for any good rises at any price, which will cause excess demand situation
followed by higher prices. At higher prices, producers are ready to supply more as they find
themselves earning positive profits. The rise in the price will lead to the change in the

allocation of the society’s resources. In the long run the profits will lure more investment
and resources in that goods market. So, just a change in the demand for a product will lead
to reallocation of resources.
Conclusion
In this chapter, we learned about the three short run condition in which any firm will find
themselves. How the short run changes affects the shape of long run cost curves. We
learned about when and how the economies of scale and diseconomies of scale arise with
the change in the scale of production? We also learned about what long run adjustment
need to be taken when the short run conditions changes the equilibrium. Investment moves
towards the profit opportunities.
Summary
1. In the short run, any firm will be earning positive profits, suffering losses or just
breaking even. In case of breaking even the firm is just earning a normal rate of
return.
2. A firm that is earning profit in the short run and expects to continue doing so has an
incentive to expand in the long run. Profits also provide an incentive for new firms to
enter the industry.
3. A firm incurring losses in the short run will either shut down and bear cost equal to
the fixed cost or keep operating when the revenue are enough to cover the average
variable cost and these operating profit can be used to reduce the losses.
4. Anytime the price is below the minimum point on the AVC curve, there will be
operating losses and firm will shut down. The minimum point on the AVC curve is
called the shut- down point. At all prices above the shut-down point, the MC curve
shows the profit-maximizing level of output.
5. The short run supply curve of a firm in a perfectly competitive industry is the portion
of its MC curve that lies above its AVC.
6. Industry supply curve shifts either in the short run when something causes the MC to
change across the industry or in the long run entry or exit of firms.
7. When an increase in the firm’s scale of production leads to fall in the AC, the firm is
exhibiting increasing return to scale or economies of scale. When AC do not change
with the scale of production. When AC rises with the increase with the increase in the
scale of production.

8. A firm LRAC shows the cost associated with different scales on which it can choose to
operate in the long run.
9. When there are short run profit, firms will enter and existing firm will expand in the
industry which will shift the supply curve to the right which in turn leads to fall in
prices till profits eliminate. When there are losses, some firms will exit and some
reduce scale which will shift the supply curve to the left which increases the price till
losses eliminates.
10. Long run competitive equilibrium is reached when profits are zero and
P=SRMC=SRAC=LRAC.
Question for Review

1. What do understand by normal rate of return? When do firm break even?
2. Is it possible for a firm to exhibit both diminishing return in the short run and
increasing return to scale in the long run? Explain.
3. What is the long run equilibrium condition of a competitive firm?
4. What do you understand by economies of scale and diseconomies of scale?
5. Discuss the actual long-run adjustments that are likely to take place in response to
short run profits and losses.
1. A firm suffering operating losses if
a) price exceeds average variable cost but is less than average total cost.
b) price exceeds marginal cost.
c) revenues are smaller than variable costs of production.
d) revenues are greater than variable costs of production but less than total costs.
2. When a firm expands its scale of operations, and such expansion leads to lower cost
per unit, we say that the firm exhibits:
a) Diminishing returns.
b) Constant returns to scale.
c) Increasing returns to scale.

d) A fixed factor of production.
3. A firm will choose to operate rather than shut down in the competitive industry as
long as
a) price is greater than or equal to AFC.

b) AFC is greater than AVC.
c) AVC is greater than MC.
d) price is greater than or equal to AVC.
4. Which of the following conditions exist in long-run competitive equilibrium?
a) Individual firms operate at the most efficient scale of plant.

b) The level of output produced coincides with the minimum point on the LRAC
curve.
c) P = LRAC.
d) All of the above.
5. The short run supply curve of a firm in a perfectly competitive industry is the
a) MC curve that lies below the AVC curve

b) MC curve that lies above the AVC curve
c) MC curve that lies below the ATC curve
d) MC curve that lies above the ATC curve

1 C
2 C
3 D
4 D
5 B

Answer 1. Option c) Operating losses are difference between total revenue and total
variable cost. When revenues are less than the total variable cost, there are operating
losses.
Answer 2. Option c) When AC falls with the rise in the level of output, that firm will exhibit
IRS.
Answer 3. Option d) A firm will choose to operate rather than shut down as long as revenue
is more than the AVC. In other words, till the firm is enjoying operating profits it will choose
to operate rather than shut down.

Answer 4. Option d) A firm attains long run competitive equilibrium when they are
producing at the optimal level and there will be zero profits and P=LRMC=SRAC=SRMC
Answer 5. Option b) The short run supply curve of a firm in a perfectly competitive industry
is the portion of its MC curve that lies above its AVC.

Answer 1. Option a) is incorrect, as they define the case of operating profits Option b) is
wrong as it represents the profit situation of a firm. Option d) is also incorrect as they also
represent the case of operating profits.
Answer 2. Option a) is incorrect because diminishing return is a short run concept and in the
short run scale of production doesn’t changes. Option b) is not correct because in CRS, AC
doesn’t change. Option d) is wrong as fixed factor of production is only related to the short
run.
Answer 3. Option a), b) and c) are incorrect because at all this point there is no situation of
operating losses.
Answer 4. All of the options are correct. As a firm attains long run competitive equilibrium
when they are producing at the most efficient level which means they are producing at the
minimum point of LRAC curve and P=LRMC=SRAC=SRMC.
Answer 5. Option a) is incorrect because in the competitive firm P = MC and any price below
AVC represent shut down which means no production. Option c) and d) are incorrect
because even if firms are not able to gain economic profits they will continue producing till
they are generating operating profits.
Appendix to Chapter 8
External Economies and Diseconomies and the Long-Run Industry

Supply Curve
When Economies and Diseconomies are found on industry wide basis rather than on within
the firm they are known as external Economies and Diseconomies. Sometimes AC increases
or decreases with the size of the industry. When LRAC decreases (increases) as a result of
industry growth we say that there are external economies (diseconomies). For instance
when government announces agricultural subsidy that will lead to the External economies as
AC falls because the subsidy has been given to the entire agricultural industry rather than a
particular firm. Similarly, when oil prices increases it will affect the entire industry and
leads to the External Diseconomies as all the firms AC will rise along with the rise in the oil
prices.

The Long-Run Industry Supply Curve
Long run competitive equilibrium is achieved when the profits are driven to zero and
P=LRAC=SRAC=SRMC. Here all firms are producing at the optimal scale. It takes time to
achieve this long run equilibrium. In a dynamic economy, long run equilibrium point will
keep changing. When the changes like growth in population and stocks of input factor, and
as preferences and technology changes some sectors will contract and some will expand. To
adjust to these long term changes, the industry has to consider both internal and external
factors.
The shape of long run average cost (LRAC) curve is determined by the extent of internal
economies (or diseconomies). When a firm scale of operation changes and it either expands
or contracts, its AC will increase, decrease or remain constant along the LRAC curve. In case
a firm having internal economies expands its scale will find its AC decreasing and in case a
firm facing diseconomies expands its scale will find its AC increasing.
The external economies and diseconomies arise from an industry wide expansion. When
industry faces external diseconomies, LRAC curve shifts upward i.e. cost increases
regardless of how much a firm produces. When industry enjoys external economies, LRAC
curve shifts downward i.e. cost reduces at all the potential level of output. This is because
external economies (or diseconomies) reduces (or increases) the costs.
We will see example of both an expanding industry facing external economies and external
diseconomies in figure 9 and 10 respectively.

In figure 9, we will see that the industry and a representative firm both are at long run
equilibrium. P* is the equilibrium price determined by the intersection of demand, DD 1 and
supply curves, SS1. All firms at this points have zero economic profits and the price P*
intersects the LRAC curve at its minimum point, i.e. at optimal point . When demand
increases, it will shift the demand curve from DD 1 to DD2, price rises along with the demand
from P* to P**. Rising prices increases the profit opportunities and new firms will enter and
old firms will expand. This will shifts the supply curve from SS 1 to SS2, driving price down.
If the LRAC falls as a result of expansion to LRAC 2 from LRAC1, the final price will be below
the original price, P*. So, the long run industry supply curve (LRIS) slopes downwards an
industry is enjoying external economies. Such an industry is also known as decreasing cost
industry.
Similarly, in figure 10 we will see that the industry and a representative firm both are at
long run equilibrium. P* is the equilibrium price determined by the intersection of demand,
DD1 and supply curves, SS1. When demand increases, it will shift the demand curve from
DD1 to DD2, price rises along with the demand from P* to P**. With the rise in prices new
firms will enter and existing firm expands shifting the supply curve from SS 1 to SS2, driving
price down. If the long run industry supply curves (LRIS) slopes upwards an industry is
facing external diseconomies. Such an industry is known as Increasing cost industry.
Reference
1) Case E Kase and Fair C Ray, Principles of Economics, 9th Edition.


Behavior Of Profit Maximizing Firms And The Production Process
Introductory Microeconomics
Unit IV – The Firm and Perfect Market Structure
Lesson: Behavior of Profit Maximizing Firms and The Production

Process
Lesson Developer: Jasmin
Jawaharlal Nehru University

Table of Contents
Behavior of Profit Maximizing Firms and the Production Process
Learning Outcomes
Introduction
Production
Profit Maximizing Firms: Behavior
 Profits and Costs

 Normal Rate of Return to Capital
Decision in the Short Run and the Long Run
Factors Effecting Decisions of the Firms
Production Process
 Production Functions and Concepts of Total Product, Marginal

Product and Average Product
 Law of Diminishing Returns: Marginal Product Function
 Marginal Product and Average Product
 Production Function: The Case of Two Variable Inputs
Choice of Technology
Conclusion
Summary
Exercises
Glossary
References
Web-links
Appendix

Behavior of Profit Maximizing Firms and the Production

Process
Learning Outcomes
The objective of this lesson is to acquaint the reader with the behavior of the profit
maximizing firms in a perfectly competitive market structure, the production process and
how are the decisions related to the production process taken. After having gone through
the chapter, the reader should be able to understand the concept of perfect competition.
The decisions regarding the amount of the output to be produced, which production
technology should be used to produce the output and the quantity of inputs to be used in
the production are crucial for any firm, the lesson, looks into these aspects as well. The
reader will attain a deeper understanding of the concepts like profits, total revenue, total
cost and the normal rate of return. The chapter analyses in detail, how do the decisions and
response of the firms differ in the short and the long-run. It also discusses the factors that
influence the important decisions that the firms have to take, the production process which
involves a discussion on the types of technology, concepts of total, marginal and average
product and production function. The last section of the chapter discusses the question of
the choice of technology and its connection with the input markets. The practice questions
at the end of the chapter will help the reader, get a clear picture of the topics discussed in
the lesson. The appendix to this chapter introduces the idea of isocosts, isoquants and the
cost-minimizing optimum combination of inputs.
Introduction
An idea parallel to the concept of household decisions and consumer choice is that of the
production decisions taken by the firms. The households decide what and how much to
consume of different goods, given the prices and the income, they make decisions about the
number of hours to work. Firms in the market are also involved in a similar exercise,
wherein they decide about the inputs and their quantity that they should use in the
production process such that least cost is incurred given the input prices and the level of
output which will be profitable for them to produce. All the firms involved in production aim
at maximizing their profits and minimizing their costs, therefore optimization is equally vital
for the firms as well. The firms are involved in the production process, in which the inputs

are combine to produce output. The decisions related to production have a definite
implication for the profits and the viability of the firm. Figure 1 illustrates the circular flow
diagram, which shows the demand and supply of the inputs as well as the output. It shows
the demand and supply decision of firms and households. It is very important to understand
the questions that are faced by the firms and how are they answered. The chapter aims at
solving all these puzzles.
Figure 1 : Decisions of the firms and households
Production
Production can be defined as a process whereby, inputs are combined, processed and
converted into output. Production is a vital function of a firm be it of any size and internal
structure. There are a set of assumptions on which the analysis in this chapter is based. The
assumptions are listed below:

Production is not confined only to firms: The function of production is not confined only
to the firms in the market. Households can also process and convert inputs like land, labor,
capital etc. into output. A household that has a kitchen garden, combines land, labor,
manure, fertilizers, seeds and other tools to grow vegetables. Government utilizes various
factors of production to provide various services of public utility.
Firms are different from households and government in the sense that they produce goods
or services to meet the demand for those goods or services to make profits.
Firms differ from each other on the basis of their size, type of organization and the market
structure that they function in. We analyze the case of perfect competition here.
Perfect Competition: There are characteristics particular to a perfectly competitive

industry. The industry comprises of a large number of relatively small firms that produce
homogeneous goods. No specific firm can control market price of either the output it
produces or the inputs that it uses for production. Hence, two features specific to the
perfectly competitive industry are that each firm is very small compared to the size of the
industry and all the firms in a perfectly competitive industry produce identical goods.
Resultantly, each firm in the industry takes the market price, which is determined by the
supply and demand, as given. These firms can be described as “price-takers”. At the given
price, the firms can decide how much output to supply, quantity of inputs to purchase and
how to produce the output.
Since the products produced by the firms are homogeneous, no firm can charge a price
above the market price as the consumers, in this case will easily shift to the other sellers in
the market and the firm who fixed a price above the market price will incur losses. Also, it is
very clear, that no firm would want to charge a price below the market price, since it can
sell any quantity of output at the given market price. The demand for output produced by
such a firm is also perfectly elastic. For instance, let’s consider the case of Ram who sells
pens in a perfectly competitive market. Part a in figure 2 depicts the supply and demand
conditions in the market. Say the price set by the market is Rs.5 per pen. Part b in figure 2
represents the demand curve being faced by a perfectly competitive firm for its output. It
would not be beneficial for Ram to raise the price of pen above Rs.5, since the consumers
will shift their demand to the other sellers and he will not be able to sell any pen. On the
other hand, he would not want to fix a price below Rs.5, because he can sell as many
number of pens at this price as he wants.
Figure 2 : A perfectly competitive market structure and the demand faced by a single firm

In the perfectly competitive industry it is also assumed that the entry is easy, it is very easy
for the firms to enter and exit the industry. If the existing firms in the industry are earning
high profits, new firms would also enter. For example even if there are several stationery
shops, there are no barriers for a new stationery shop to spring up. Similarly the exit is
also easy, if a firm is incurring losses, it can easily shut down the business. Firms might face
losses when there is changing technology, changes in tastes and preferences, rise in costs
of production or when there is a fall in prices due to intense competition. Though it is hard
to find a perfectly competitive set up in the real world, certain markets are quite close to
perfect competition, when it comes to their structure and the way of functioning. Few
examples can be pointed out here, for instance, markets for agricultural goods, vendors
who sell food and other articles on the street etc.
Profit Maximizing Firms: Behavior

The objective of any firm is to achieve the maximum level of profits. There are several
decisions facing the firm that have a definite influence on its profits. The basic questions
that the firms need to answer are: the quantity of the good to be produced, the quantity of
each factor of production to be purchased for producing the good and the technology to
choose for producing that good. Figure 3 shows the decisions that a firm needs to make. If
a firm has decided upon the quantity of the good to produce, the choice of production
technique can determine the quantity of inputs to be used for producing the decided
quantity of output. The type of technology has an important role to play, as it defines how
effectively the inputs are transformed into output. For an example, a potato chips factory,
using complex machinery and equipment and a few laborers can produce a larger number of
packs of potato chips as compared to a small scale firm that makes potato chips primarily
using laborers, who perform major proportion of their work manually.
Figure 3 : Decisions facing a firm

Profits and Costs
We have already discussed, that any firm functions in the market, primarily to make profits.
Since, profits are so vital for any firm, it is important to understand what profit is. Profit can
be defined as the difference between the total revenue and the total cost of the firm.
Total revenue is the amount of money that a firm receives out of selling its output; it’s the
quantity of output sold (q) multiplied by the per unit price of that good (p). Total cost or the
total economic cost includes three elements. First being the explicit/accounting or out of
pocket costs, which is the cost of raw materials and other inputs used in the production. The
normal rate of return on capital and the opportunity cost of each factor of production are
the other two elements of the total economic cost. The normal rate of return on capital and
the opportunity cost of each factor of production can be categorized as implicit costs.
Opportunity costs are implicit and need to be included in the total cost incurred by the firm.
For example, a person who owns a business also contributes his labor services to it, but
does not get any wage in return, instead of running his own business, he could have worked
as an employee and would have got a wage for his labor. The wage that this person loses
out is the opportunity cost of his labor which needs to be added to the total economic cost.
The opportunity cost of capital, in a similar fashion is equally important. The opportunity
cost of capital can be accounted for by including the normal rate of return to capital in the
total economic costs.
Normal Rate of Return
Capital is required to establish a firm or to start a business. Money is required purchase and
set up machinery, equipment, furniture etc. This implies that this capital will stay tied up
with the business for a long period. Fresh investments also need to be made, even when the
firm or the business has been in place for a long time. There is an opportunity attached with
this invested capital. The investor or the proprietor, instead of investing his funds in the
business, could have invested them in some financial security, which would have given him
returns. This rate of return is the opportunity cost of using or investing one’s capital in the
business.

The concept of rate of return needs to be understood. A person who has invested his funds
in a business, will get a stream of returns. The rate of return can be described as the annual
flow of net returns on investment, expressed as a proportion of the total investment. A
normal rate of return, on the other hand, can be defined as the rate of return that keeps the
investors and owners satisfied. If the rate of return falls below the normal rate of return, the
owners will get a lower return if they invest in the business, they could earn a higher
returns by putting the funds in the financial securities, bonds or anywhere else. Under
normal conditions, i.e. when there is a consistent stream of revenue, there is no uncertainty
about the future, the firm earns a steady stream of revenues, the normal rate of return will
be quite close to the rate of return on risk-free government securities.
Let’s define economic profit now. Economic profit is the difference between the total
revenue and the total economic costs.
With this definition, it is easy to see that when the firm earns a rate of return equal to the
normal rate of return, firms don’t earn any profits. On the other hand, if the firm is earning
a positive sum of profit, it implies that the rate of return is above the normal rate of return
to capital. A positive level of profit will keep the investors happy and motivate new firms to
enter the industry. A negative profit, means that the rate of return is below the normal rate
of return to capital. In such a case the firms might shut down the business and move out of
the industry, a few might contract and the fresh investments will be hard to come by.
A Numerical Example
Suppose Ravi is planning to start a small-scale business. He plans to sell radios. To start the business, he
needs a shop. The money required to purchase this shop is Rs.1000000. Ravi has decided to sell 30000
radios annually at a price of Rs.100 per radio. He purchases the radio from a supplier and each radio
costs him Rs.50. He will need a person to stay on the sales counter, who will work for a yearly wage of
Rs.400000. The rate of interest on government securities is 10%, so Ravi wants to earn at least 10% on
his investment. Let’s work out the profit for this venture.
Table 1: Calculating Profits for Ravi’s Venture

Total Revenue (30000 radios x Rs.100) Rs.3000000
Economic Costs
Amount Paid to the supplier (30000 radios Rs.1500000
x Rs.50)
Wage paid to the worker Rs.400000
Opportunity cost of Capital (Rs.1000000 x Rs.100000
0.10)
Total Economic Costs Rs.2000000
Economic Profit = Total Revenue – Rs.1000000
Total Economic Costs
A profit of Rs.1000000
Ravi earns a revenue of Rs.3000000, out of his venture. The total economic costs have been
calculated by including the opportunity cost of capital or the normal return to the capital.
The economic profit generated by this venture is Rs.1000000.

Decision in the Short Run and the Long Run

Various decisions taken by a firm are with respect to the time period. There are decisions
that need to be made in the short-run, for instance, the quantity of good to be produced
with the existing machinery or plant. Also there are decisions that are made over the long
run, like expansion of a factory, setting up a new plant etc. Hence, time is another
important factor that is taken into consideration while a firm is making decisions. Decisions
and reactions of a firm in the short-run often differ from the kind of questions and decisions
that a firm is faced with in the long-run.
Short-run can be defined with the help of two features. In the short-run, some factor of
production for the firms existing in the industry, is given to be fixed i.e. the quantity or
scale of that factor cannot be altered. Also, in the short-run the entry and the exit of firms
from the industry is difficult. It can be said that there are bottlenecks in the entry and the
exit of the firms. A firm winding up its business, in order to move out of the industry, might
still find some locked up fixed costs which are yet to be recovered. The factor of production
which is fixed, differs from one industry to the other. For a firm the plant or the machinery
can be a limit, for a professional his time can be a constraint, for a bakery the place of work
i.e. the shop might be a constraint. Land as an input is also fixed in the short run.
Long-run is the time period where no factor of production is fixed, nor are there any
restrictions or difficulties in the entry and the exit of the firms from the industry. Firms are
free to alter the scale at which they operate.
Factors Effecting Decisions of the Firms

Profit of the firms depends on the cost to produce the good and the price for which it can be
sold. Cost of production is determined by the production techniques and the prices of
inputs. Hence, the decisions of a profit maximizing firm will be based on factors three
important factors: market price at which the firm can sell each unit of the good it produces,
the production technologies available and price of each factor of production or input.
The firms are always on a look out for an optimal method of production. The optimal
method of production is the one, through which the firm incurs the least cost in production.
Once the cost minimizing method of production has been chosen and the market price of
the good and inputs is also known, firm can decide about the quantity of output to be sold
and the quantity of inputs to purchase. This has been illustrated in figure 4.
Figure 4 : The optimal method of production

Production Process
Production is any process through which the inputs are processed and converted into
output. Production technology is a functional relationship between the inputs and the
output. For example producing a cotton shirt requires cotton, threads, buttons, dyes,
machinery, electricity, laborers and other inputs. It is possible that a good can be produced
through a number of different production techniques. The technology can be labor intensive
or capital intensive. A production technique that uses more of labor relative to capital, is
called a labor intensive production technology. On the other hand, a production technique
that uses more of capital relative to labor, is a capital-intensive technology. For example, to
make a swimming pool in a resort 50 laborers can be employed, with necessary tools and
equipment. This is a labor intensive technique. On the other hand, the swimming pool can
also be made with the help of 15 laborers, a crane and other machinery. This is a capital-
intensive technique. Since, the firm tries to choose the method of production which
minimizes the cost, a firm in an economy with abundant supply of cheap labor will use
labor-intensive techniques of production. However, in the economy where, the labor is short
in supply and the wages are high, the firms will have a tendency to use more of capital
relative to labor in the production process.
Production Functions and Concepts of Total Product, Marginal Product and

Average Product

A production function can be describe as the mathematical relationship between the inputs
and the output. The total product function shows the total number of units of output that
will result on using different units of inputs.
For example, in a bakery one worker, working alone can produce 12 cookies in an hour. If
another worker is added, both the workers produce a total of 27 cookies in an hour, which
means that the second worker can produce 15 cookies in an hour. With the third worker the
total number of cookies produced rises to 37, i.e. the third worker adds only 10 cookies.
This could be because with three workers, the kitchen gets crowded and workers come in
each other’s way. Also the number of ovens is fixed, so three workers get to work on with a
fixed number of ovens, so there is a capital constraint. Note that we assume that all the
workers are equally efficient, it is the constraint of space and capital which leads to fall in
the number of cookies added to the total production by the third worker. With the addition
of the fourth and the fifth worker, these constraints are felt more strongly and the addition
made to the total production of cookies by each worker falls. With the fourth worker, the
total production of cookies rises to 40 and with the fifth worker it rises to 41 cookies. With
the sixth worker there is no further rise in the total production.
Tale 2: Production Function

Laborers Total Product Marginal Product of Average Product of
(Cookies Per Hour) Labor Labor
0 0 - -
1 12 12 12
2 27 15 13.5
3 37 10 12.33
4 40 3 10
5 41 1 8.2
6 41 0 6.83
Part a of figure 5 shows the total product function.
Law of Diminishing Returns: Marginal Product Function
Marginal product can be defined as the additional units of output that can be produced by
employing an additional unit of a particular input, holding the quantity of other inputs fixed.
Table 2 above shows the marginal product of labor. The first unit of labor in the bakery
produces 12 cookies, the second unit of labor adds 15 cookies to the total production, the
marginal product of the third worker is 10 cookies, fourth worker adds 3 cookies, fifth
worker produces 1 cookie, while the marginal product of the sixth unit of labor is 0. Part b
of figure 5 shows the curve for marginal product of labor.
Figure 5 : Production function for cookies

According to the law of diminishing returns or the law of variable proportions, beyond a
particular point, if additional units of a variable input are employed along with fixed inputs,
the marginal product of the variable input falls.
Law of Diminishing Returns
In the Essay on the Influence of a Low Price of Corn on the Profits of Stock (1815), the
British economist David Ricardo introduced the law of diminishing marginal returns. Ricardo
derived the law mostly out of his observations of agriculture and land, labor and capital
involved in it.

It is the short run where the firm or a factory or a farmer faces the constraint of fixed
inputs. Hence law of diminish returns always applies in the short run.
Marginal Product and Average Product
Average product is the amount of output produced on an average by each unit of the
variable input employed. Table 2 also shows the average product of labor. The average
product of labor is calculated by dividing the total output the total number of units of labor
used. For instance the average product of the first two units of labor is 13.5 (27/2), while
the average product of 6 units of labor is 6.83 (41/6).
The average product and marginal product are related to each other, however the average
product is not very quick to change, as compared to the marginal product. If the marginal
product exceeds the average product, the average product increases. For instance, Sam
participates in a competition that has five rounds and he has already completed two rounds.
Suppose he gets points for each round and his average for the first two rounds is 10, if he
scores 8 in the third round, his average for three rounds will fall but not all the way to 8, the
average will be 9.33. If he gets 12 points his average will rise but not all the way up to 12,
the average will be 10.66. Table 2 shows that the marginal product has been falling after

employing the third worker. Though the average product also falls with the marginal
product, it has been falling slowly, when compared with the marginal product. Figure 6
shows the graph of the Total product and the graph of marginal and average product. The
marginal product curve is nothing but a depiction of the slope of the total product function.
As figure 6 shows, the marginal and average product curves start out together. While the
marginal product is rising and is above the average product curve, the average product
rises with it but at a slower pace. The marginal product curve reaches its maximum at point
A with number of workers, before the average product reaches its maximum at point B
with number of workers. At point A, the marginal product curve begins to fall since at
this point the additional u nits of output that an extra worker generates, begins to fall due
to fixed inputs or capacity constraints. At point B, the average product and marginal product
of labor are equal. The average product of labor continues to rise till point B, while the
marginal product has already begun to fall at point A. Average product is equal to the
marginal product of labor, when it reaches its highest point B. Beyond point B and till the
point C, the marginal product continues to decline and it is less than the average product of
labor. The average product also follows this decline in the marginal product. At the point
where units of labor are employed, the marginal product falls to 0, i.e. an additional unit
of labor cannot add to the output. This is point C and this is where the firm reaches its
capacity and the total product is at its maximum.
Figure 6 : Total product, average product and marginal product

Production Function: The Case of Two Variable Inputs
Inputs are usually used in conjunction with each other. Labor and capital are two inputs
which can be seen as complementary in nature. Using more capital in the production
process can raise the productivity of labor. So, if the demand for cookies is on a rise, while
the bakery has hit its capacity of production, where all the workers are working with fixed
inputs for example a single oven, the owner of the bakery can think of expanding the
production capability of the bakery. He can infuse more capital in terms of another oven for
the bakery. The additional oven can raise the productivity of the labor as it will raise the
average output that a single worker can produce in an hour.
Choice of Technology
As we have discussed, inputs are used in conjunction with each other. The factors of
production are complementary in nature. Labor and capital are used together in production
and reach others productivity. However, different factors of production also act as
substitutes for each other. If capital is expensive relative to labor in an economy, the firms
will be motivated to shift to labor-intensive techniques of production. Similarly if labor is
relatively expensive compared to capital, the firms would want to shift to capital intensive
techniques. The type of production technique which will be chosen by the firm depends on
the prices of inputs determined by the input markets. Suppose, Rahul wants to manufacture
150 toys in a week. Table 3 shows several options of production technology that can be
used to produce these 150 toys.

Table 3: Production Technologies Available to Produce 150 Toys

Technology Units of Capital (K) Units of Labor/Hours of
Labor (L)
A 3 10
B 4 7
C 5 6
D 6 3
E 7 1
Analyzing, different production technologies, it is easy to see that out of all the options,
technology A is the most labor intensive while technology E is the most capital intensive.
Since, the firm chooses the production technique which minimizes the cost, its ultimate
decision will depend on the market prices of the inputs. Let’s assume that the wage rate
(W) is Rs.1 and the cost of capital per hour (R) is Rs.5. The total cost corresponding to each
production technique can be calculate given the input prices.
Given the input prices, technology A is the one that will produce 150 toys at the least cost,
which is Rs.25, as shown in table 4. All the other technologies cost more than this amount.
Hence, the firm will choose technology which is the most labor-intensive technique. Now, if
the wage rate rises to Rs.7 and the cost of using capital per hour stays fixed at Rs.5, the
cost minimizing production technique after the rise in wage rate is option E. The cost of
production with technology E is Rs.42 after the rise in wages, as shown in table 4. So, the
firm will choose option E which is the most capital intensive technique out of all the options.
Hence, the cost of production depends on the available production techniques and the input
prices decided by the input markets.
Table 4: Alternative Production Techniques and corresponding Cost [Cost = (LxW) +

(KxR)]
Technology Units of Capital Units of Labor/ Cost when Cost when
(K) Hours of Labor W=Rs.1 W=Rs.7
(L) And R=Rs.5 And R=Rs.5
A 3 10 25 85
B 4 7 27 69
C 5 6 31 67
D 6 3 33 51
E 7 1 36 42
Conclusion
The lesson throws light on important elements that go into the decision making process of a
firm. The ultimate goal of any firm is to generate profits for itself. Decisions taken by the
firm effect its profit. These decisions are regarding the quantity of output to be produced,
choice of production technique and the quantity of inputs to purchase. Hence, it is important
to understand the market structure in which a firm operates, the types of production
techniques that are available for production and what does the cost of production depend
on. The decisions made are such that, the profits should be maximized while the cost should
be minimized.

Summary
The chapter focuses on how the production decisions are taken at the firm level. Case of the
firm functioning in a perfectly competitive set up has been discussed. Following points
summarize the chapter.
 Firms differ in size and structure. For instance a firm functioning in a perfectly
competitive industry is a price-taker.
 Perfect competition is a market structure where there are several firms that are
small in size relative to the industry, each firm produces identical goods and there is
no restriction on entry and exit of the firms.
 The demand curve facing a firm in a perfectly competitive industry is perfectly
elastic, i.e. at this price the firm can sell any amount of output, but it will not be able
to sell anything if it fixes a price above this price. Also the firm will not want to
reduce the price it charges below the market price.
 Profit maximizing firms have to take three basic decisions. The first being the
quantity of output to produce, second, the choice of production technique and the
third, the quantity of inputs to purchase.
 The ultimate aim of the firm is to make profit. Profit is the difference between total
revenue and total cost of the firm.
 The total economic costs include out of pocket costs that are explicit in nature, the
opportunity cost of each input and the normal rate of return to capital which are
implicit in nature.
 The normal rate of return to capital is the rate of return which is sufficient to keep
the investors and owners satisfied. In normal conditions, it is quite close to the rate
of interest on risk-free government securities.
 If the firm makes positive profit, it implies that the rate of return that it earns is
greater than the normal rate of return to capital.
 Decisions made by the firm also take into consideration the time period. Short run
differs from the long-run since it involves fixed inputs and the entry and exit of the
firms from the industry is constrained.
 Decisions to be taken by the firm depend on market price of the good it produces,
the production technologies available and the input prices.
 A production function entails how the inputs are related to output. It is a
mathematical relationship between inputs and output.
 Marginal product is the additional units of output produced by employing an
additional unit of variable input. The law of diminishing returns states that beyond a
particular point, if additional units of a variable input are employed along with fixed
inputs, the marginal product of the variable input falls.
 Average product is the average amount of output produce by each unit of variable
input employed. It is related to the marginal product. It rises when the marginal is
above the average product, it is equal to the marginal product at its highest level
and falls when the marginal product falls below it.
 Capital and labor are inputs, complementary in nature, but they can also act as
substitutes.
 A profit maximizing firm uses the technology that minimizes the cost of production,
given the prices of inputs and various production techniques.

Exercise
Review Questions
Q.1 Discuss the features of a perfectly competitive market structure. Why are the firms in a
perfectly competitive industry called “Price-Takers”?
Q.2 Why is normal rate of return to capital added while calculating total economic costs?
Q.3 What is the law of diminishing returns?. In the table given below determine whether
there is a case of diminishing returns.
Labor units Total Output

0 0
1 6
2 13
3 19
4 23
5 26
Q.4 Draw curves for total product, marginal product and average product. Illustrate the
relation between average product and marginal product.
Q.5 what does the choice of the cost-minimizing production technique depend on.

Q.1 Features of perfect competition are:
a. There are a large number of sellers.

b. All the firms produce homogeneous products.
c. There is no restriction on the entry and exit of the firms.
Q.2 Which one out of the following is included to calculate the total economic costs:
a. Revenue.
b. Profit.
c. Price of the output.
d. Normal rate of return to capital.
Q.3 Which one of the following represents short run:
a. 4 months.
b. 6 months to a year.
c. The time period where all the inputs are variable.
d. The time period where one or more of the inputs are fixed.
Q.4 The law of diminishing returns states that:
a. When additional units of a variable input are used with fixed inputs, the marginal
product of that variable input declines.

b. When additional units of a variable input are used with fixed inputs, the marginal
product of that variable input rises.
c. When additional units of a variable input are used with fixed inputs, the marginal
product of that variable input becomes constant.
Q.5 While choosing the production technology, the profit-maximizing firm should keep in
mind:
a. Input-prices.
b. Available production techniques.
c. Market price of output.

Q.1 d
Q.2 d
Q.3 d
Q.4 a
Q.5 d

Answer 1. The characteristics of a perfectly competitive industry include a large number of
firms, these firms sell identical products and there is no restriction on the entry and exit of
firms.
Answer 2. The opportunity cost of capital are accounted for by including the normal rate of
return to capital in the total economic costs.
Answer 3. Short-run is the time period where one or more of the inputs are fixed.
Answer 4. The law of diminishing returns state that when additional units of a variable input
are used with fixed inputs, the marginal product of that variable input declines.
Answer 5. The choice of production technology by a profit-maximizing firm depends on input

prices, available production technology and the market price of output.

Answer 1. All the options for question 1 are correct hence the answer is option d.
Answer 2. Option a is incorrect, revenue is not included in the total economic costs. Option
b is also not correct, profit is calculated by deducting costs from revenue. Option c is
incorrect since price of the output is used in calculating the revenue earned by a firm.
Answer 3. Option a and b are incorrect, short-run is not earmarked by months or years.
Option c defines the long-run.

Answer 4. Option b is incorrect, the law states that as more and more units of variable input
are used with fixed inputs, the marginal product of the variable input declines. Option c is
incorrect for the same reason. Option d is ruled out.
Answer 5. All the options for question 5 are correct.
Glossary
Average Product: average product is the ratio of total product to the total units of the
variable input. It is the average product produced by each unit of variable input.
Capital-Intensive Technology: The production technology that uses greater number of units
of capital relative to the units of labor.
Labor-Intensive Technology: The production technology that uses greater number of units
of labor relative to the units of capital.
Marginal Product: The additional units of output produced by an additional unit of variable
input employed.
Homogeneous Products: The goods that are identical to each other in terms of quality and
characteristics.
References
Case, Karl E. and Fair, Ray C. (2007), “Principles of Economics”, Ch.7, 8 th edition, Pearson
Education Inc.
Web Link
http://www.econlib.org/library/Enc/bios/Ricardo.html

Appendix
Introduction to Isocosts and Isoquants
Table A.1 Various Combinations of Capital (K) and Labor (L) which can be used to produce
output units 75, 150 and 225.
Output = 75 Output=150 Output=225
K L K L K L
A 1 9 2 10 3 11
B 2 6 3 7 4 8
C 3 4 4 5 5 6
D 6 3 8 3 9 4
Table A.1 shows various combinations of capital and labor that can be used to produce three
different levels of output of good x. These levels of output are 75 units, 150 units and 225
units. A curve that shows different combinations of inputs, capital and labor, to produce a
given level of output is called an isoquant. Figure A.1 shows isoquants for three different
levels of output, using the data shown in the table A.1. Each isoquant represents infinite
combinations of inputs that can be used to produce the corresponding level of output. There
can be several isoquants corresponding to several levels of output. The higher the isoquant
greater is the level of output attached to it.
Figure A.1 : Isoquants showing combinations of labor and capital to produce levels of output
Q1 = 75, Q2 = 150 and Q3 = 225

Figure A.2 shows the slope of the isoquant, where the isoquant has been drawn for the level
of output 75 units. Points F and G represent two points on the isoquant. When one moves
from point F and G, the capital employed falls and the units of labor rise. The output lost
due to fall in the number of units of capital employed is given by multiplied by . The
marginal product of capital is the number of additional units of output produced by
employing another unit of capital. To keep the level of output constant along the isoquant,
this loos in output must be made up by the addition to the output by employing more units
of labor. This addition to the output is similarly calculated as multiplied by .
So, the slope of the isoquant is given by
The ratio of marginal product of labor to the marginal product of capital is called the
marginal rate of technical substitution. It measures the rate at which a firm can substitute
capital in place of labor, keeping the level of output fixed.
Figure A.2 : Slope of an Isoquant

Isocosts
A curve that shows several combinations of capital and labor to produce output at a given
cost, is called an isocost line. Like isoquants, isocost lines are infinite in number. Figure A.3
shows lines. If the price of labor is and the price of capital is , the isocost line is given
by the equation:
The lowest isocost line represents the combinations of capital and labor corresponding to
the lowest cost of production. Suppose the price of labor is Rs.1 and the price of capital is
Rs.1, figure A.3 shows three isocost lines corresponding to total cost of Rs.4, Rs.5 and Rs.6.
Figure A.3 : Iso cost lines
In the figure A.4 slope of isocost line has been shown for total cost Rs.16, = Rs.1 and
= Rs.2. The isocost line shows several combinations of capital and labor that can be
purchase for a total cost of Rs.16. to draw the isocost line the endpoints can be marked.

Point A of the isocost line is given by i.e. 8 units of capital. Similarly point B is
given by i.e. 16 units of labor. The slope of the isocost line is given by:
This formula gives out the slope for the above isocost line, which is -1/2.
Figure A.4 : Slopes of Isocost line
Finding the Cost Minimizing Production Technology

Suppose the firm that we are considering, functions in a perfectly competitive setup. This
firm wants to maximize its profit by minimizing the cost incurred in production. Let’s
consider an isoquant which corresponds to the level of production of 100 units of good X.
Suppose = Rs.1 and = Rs.1, now 100 units of good X can be produced at a cost of
Rs.7 using 4 units of capital and 3 units of labor, this is shown by point C in the figure A.5.
The same level of output can also be produced by using 6 units of capital and 2 units of
labor, represented by point D which will cost Rs.8. 100 units of output can also be produced
at point B by using 2 units of capital and 6 units of labor which again costs Rs.8. As shown
in the figure the minimum cost at which 100 units of good X can be produced is Rs.7. A firm
that wants to maximize its profit and minimize its cost will produce at point C which
represents the cost minimizing technology for given level of output. The cost minimizing
technology to produce a given level of output is represented by the point where the isocost
line and the isoquant for that particular level of output are tangent to each other.
Figure A.5 : Least cost combination for 100 units of good x

Let’s draw another diagram with three isoquants showing different levels of output, i.e. 100,
150 and 200 units of good X. Figure A.6 shows the cost minimizing production technology
for these levels of output. We maintain that = Rs.1 and = Rs.1. The minimum cost of
producing 100 units of good X is represented by the isocost line with total cost of Rs.4, for
150 units of good X is shown by the isocost line with total cost of Rs.5 and for 200 units of
good X is shown by the isocost line with total cost of Rs.6.
Figure A.6 : Cost minimizing production technology for Q1 = 100, Q2 = 150 & Q3 = 200
Equilibrium Condition to Reach the Cost Minimizing Production Technique

The equilibrium point is reached where the isocost line is tangent to the isoquant. As shown
in Figure A.6, points A, B and C are points of equilibrium or points of tangency. At the point
of equilibrium the slope of isoquant is equal to the slope of isocost line.
The equilibrium condition is: or
The same condition can be written as:

This is the firm’s cost-minimizing condition. Left side is the output produced by the last
rupee spent on labor and the right side is the output produced by the last rupee spent on
capital. If these two measure are not equal, the firm can lower the cost by substituting
more labor for capital or vice versa. Figure A.7 shows the total cost curve that represents
the minimum cost to produce different levels of output.
Figure A.7 : Minimum cost of producting different levels of output
Summary of Appendix
A few important points discussed in the appendix that need to be reviewed are:
 An isoquant represents infinite combinations of inputs that can be used to produce
the corresponding level of output.
 The slope of the isoquant is given by: . The ratio of marginal product
of labor to the marginal product of capital is called the marginal rate of technical
substitution. It measures the rate at which a firm can substitute capital in place of
labor, keeping the level of output fixed.
 A curve that shows several combinations of capital and labor to produce output at a
given cost, is called an isocost line. The slope of the isocost line is given by:
 The point of equilibrium where the slope of isoquant is equal to the slope of isocost
line shows the cost-minimizing technology of production for a given level of output.
The equilibrium condition is: or . The same condition can
be written as:

Exercise for Appendix

Review Questions
Q.1 Draw the isocost line, when the total cost is Rs.200, the price of labor is Rs.5 and the
price of capital is Rs.10. How will the isocost line change if the price of labor becomes Rs.10
while the price of capital is Rs.5. Give the slope of the isocost line in both the cases.
Q.2 Give the equilibrium condition for the cost-minimizing production technique.

Q.1 The isocost line shows:
a. The different combinations of inputs that can be used to produce output at a given
total cost.
b. The budget of the consumer.
c. The different combinations of inputs to produce a given level of output.
d. The combinations of two goods that leave a consumer equally satisfied.
Q.2 Higher the isoquant:
a. Higher the level of output corresponding to it.

b. Higher the level of utility attached to it.
c. Higher the cost attached to it.
Q.3 The cost-minimizing production technique is the one wherein:
a. The profit is the lowest.

b. The isocost line and the isoquant curve for the given level of output are tangent.
c. The isocost line and the isoquant curve for the given level of output intersect.
d. The tangency of isocost line and isoquant is not needed.

Q.1 a
Q.2 a
Q.3 b

Answer 1. The isocost line shows the different combinations of inputs that can be used to
produce output at a given total cost.

Answer 2. Higher the isoquant, higher is the level of output that it represents.
Answer 3. The cost-minimizing production techniques is given by the point where the slopes
of the isocost line and the isoquant curve for the given level of output are the same. So, it is
the point where they are tangent.

Answer 1. Option b is incorrect, the budget of the consumer is given by the budget line.
Option c is incorrect as it defines an isoquant. Option d defines an indifference curve.
Answer 2. Option b discusses a concept related to indifference curves, isoquants do not

show utility. Option c states a characteristic of the isocost lines. Option d is ruled out.
Answer 3. Option a is incorrect, the cost-minimizing technique corresponds to maximum

profit. Option c is wrong, equilibrium cost minimizing technique is characterized by tangency
of isocost line and the isoquant curve for the given level of output. If the two intersect, the
firm can move along the isoquant down to the point where the cost is minimum and the
slopes of the two curves are equal. Option d is therefore incorrect because tangency is
needed.
Glossary for Appendix

Isocost line: A line that shows several combinations of inputs to be used for production for a
given total cost.
Isoquant: A curve that shows several combinations of inputs to produce a given level of
output.
Marginal rate of technical substitution: The rate at which capital can be substituted in place
of labor by the firm, holding the level of output fixed.

Monopoly and the Antitrust Policies of the Government
Paper : Introductory Microeconomics
Unit V- Monopoly and the Policies of the Government
Lesson: Monopoly and the Antitrust Policies of the Government

Table of Contents
Learning Outcomes
Concept of Monopoly
Why monopoly arises?
Pricing and the Output Decision of the Monopoly
Welfare Cost of the Monopoly
Public Policy towards Monopolies
Price Discrimination
Comparing Monopoly and Competition
Conclusion
Summary
Exercises
References

Learning Outcome
We are going to analyse monopoly market here. After reading this chapter one would be
able to answer questions like what is monopoly, how it arises, what makes monopoly
different from competition, what government does to control the problem of monopoly? One
would also be able to answer about price discrimination and the inefficiency caused by
monopoly. In this chapter we try to explain monopoly industry with the use of both
hypothetical data and diagrams, which would make the concepts more clear.
Monopoly
An industry with a single firm that is the sole seller of the product for which there are no
close substitutes and having the barriers to entry. There are key characteristics of
monopoly: - single seller, no close substitute, barriers to entry… It’s a type of imperfect
competition. They are the price makers.
Why they arise or factors leading to Monopoly

Barriers to entry are the most important reason for the birth of a monopoly. A monopoly
firm is the only seller in the market of its product and no close substitute exist for its
product. As no other firm is allowed to enter the market, that gives them the monopoly over
that product. Barriers to entry are something that prevents any new firm to enter and
compete within the monopoly market. They can be natural or man-made barriers. Following
are the reasons behind it:
Government created Monopolies: Many a time monopoly arises because

of government directives i.e. sometimes government grants some firm the exclusive right to
provide some goods or services. They reason that it is efficient and in public interest to
provide these rights. For instance, Indian Railways is a classic example. It’s a government
monopoly in India; they are the sole provider of railway services in the country. But
sometimes monopoly also arises because of strong lobbying and political nexus between the
firm and the politicians.
Patents and Copyright Laws: It is another way of creation of monopoly by

the Government. Government provides exclusive rights to the inventor for the use of the
product or a process. For instance Patent granted to any Pharmaceutical company for
discovering a new drug for say 25 years provide them with the exclusive right to produce

that drug and sell that drug for that period of time in the market. It prevents any other
company to produce similar drug. Similarly, copyright rights provide an author the exclusive
rights to sell his book alone. They act as a stimulus for innovations and discoveries or any
new original work. It’s a reward for an extensive research through which some new
knowledge or new products are developed.
Monopoly Resources: Ownership of a scarce factor of production is another

factor leading to monopoly. A classic example of such a monopoly is De Beers, the South
African diamond company which controls about 80% of the world production of diamonds.
But Exclusive ownership of resources by one firm or a person is very rare.
Natural Monopoly: When an entire market demand for the good or services can
be met by one single firm at lower cost than could be when more than one firm. It arises
when there are economies of scale associated with the output level. It usually arises in that
industry where production requires a very high fixed cost and a negligible marginal cost
relatively. For instance oil and gas pipelines, as there construction requires a huge fixed
cost while cost of supplying an extra unit of oil is negligible. So, it is better that a single firm
should produce output at least cost, as with more firms output produced per firm is less and
cost will be more. Natural monopoly also depends on the size of the market. It is possible
that as market size increases the monopoly give way to the competitive market. In figure 1,
we can see Monopoly arises because of economies of scale. When a firm’s ATC continually
decline, the firm has what is called natural monopoly. In such cases it’s better that only one
firm produce entire output at the least cost.

Pricing and Output decision of Monopoly

Let’s see how monopoly makes the decision regarding pricing and production of a good. In
perfect competition as there are many firms so they take prices as given i.e. they have no
power to influence the prices but in monopoly, firm is the price maker. Monopoly is the
single seller of that product which gives them the power to set the prices. Competitive firms
are ready to supply whatever is demanded at that price, there demand curve is horizontal at
that price level. Monopoly faces a downward sloping market demand curve because in order
to sell more quantity of output they have to reduce its price as only at lower prices people
will be ready to purchase more. Monopoly cannot decide both the prices and quantity. Either
they can adjust the quantity produced and let price be determined along the demand curve
or vice versa. That is the reason why monopoly does not have a supply curve.
We assume monopolist choose to maximize profits. Now let’s see what point at demand
curve, monopolist chooses to produce. Consider an arbitrary example of a monopoly’s
revenue in Table 1. Here first and second column represent monopolist quantity supplied
and the corresponding price level respectively. Looking at them we can understand that for
monopolist to sell more commodities, price need to be reduced. For instance, first unit of
quantity can be sold at Rs. 16 while second unit is sold at Rs.14 and third unit at Rs. 12…
Total Revenue is the product of quantity sold and price charged for it. AR is the revenue per
unit. MR is the marginal revenue, it is that amount of revenue which a monopolist earned by
selling an extra unit. In mathematical terms, it is the difference between revenue in nth
period and n-1th period.
By looking at the table we can easily find that AR=Price at each level of quantity produced.
We can also observe that in the case of monopoly firm, MR is always less than the Price.
And the reason is downward sloping demand curve. In other words MR< Price because for
selling an extra unit of quantity say from second unit to third unit in the table, TR increases
but by less than how much it got increased by selling second unit. Monopolists have to
reduce the prices for selling an extra unit that to not only on extra unit sold but also on all
previously selling units which will result in the fall of TR.

Table 1: Total Revenue, Average Revenue and Marginal Revenue
Quantity Price TR AR MR
(1) (2) (3=2x1) (4=3/1)) (5= TRn-TRn-1)
0 18 0 - -
1 16 16 16 16
2 14 28 14 12
3 12 36 12 8
4 10 40 10 4
5 8 40 8 0
6 6 36 6 -4
In figure 2, demand curve shows how the quantity and the price are related. Market
demand curve of a monopoly is downward sloping because more can only be sold at lower
prices. It can be seen MR curve lies below demand curve (AR curve) because to increase
quantity price must fall on all units. MR become negative, when in order to sell an extra
quantity led price to fall by enough such that TR starts declining. Both demand curve and
MR curve start at the same point indicating that MR and price of the good are same for the
first unit sold.

Profit maximization for a monopoly

a monopoly will always choose to produce that level of output where its marginal revenue is
equal to the marginal cost. In other words, the monopolist’s profit maximization level of
output is determined at the intersection point of MR and MC curve. In Perfect Competitive
firm, P=MR=MC at the profit maximizing level of output.
Whereas in a monopoly firm, P>MR=MC at the profit maximizing level of output.
In case of a monopoly firm, if MR>MC, they will produce more and in case MC>MR, they will
reduce their output level to increase their level of profits.
In figure 3, quantity produce is at horizontal axis whereas price and cost at the vertical axis.
Profit maximizing level of price and output is (P*, Q*) where MC curve intersect MR curve
from below. Monopoly will sell Q* unit of output at price P*
Profit= area (ABCD)=(P*-ATC)X Q*

In perfect competitive firm if positive profits appear, those will be drowned by the entry of
new firms whereas in monopoly as there are barriers to entry which protect their profits
from falling.
Important Note: Monopoly firm does not have supply curve. In perfect competition,
supply curve is the upward sloping part of MC curve that lies above the AVC curve. How
much to produce in perfectly competitive firms depend on the adjusting MC as price
changes. However the amount of good produced by a monopoly depends on both its MC
curve and the demand curve it faces as they set both prices and quantity.
Welfare Cost of Monopoly

Does monopoly lead to that level of output which leads to increase in the welfare of the
entire society? It is undesirable to consumers, as they charge higher prices for the output
they produce and it’s beneficial for the producer as they could charge a higher price. So
whether the total surplus increases or decreases tells us about the welfare of the entire
society.
Total surplus is the sum of consumer’s surplus (CS) and producer surplus (PS). Consumer’s
surplus is the difference between what consumers are willing to pay and what they actually

pay. Producer’s surplus on the other hand, is the difference between what producers get
and cost of producing that good.
In perfectly competitive firm, output is produced where P=MC, so market leads to the best
allocation of resources. Thus, total surplus (TS) is as large as possible.
Deadweight loss
A benevolent social planner always tries to maximize total surplus. He always chooses that
level of output where demand curve (AR curve) intersects MC curve. Demand curve
represent value to the consumer and marginal cost represent cost at the margin. So, for
social planner most efficient level of output would be where P=MC. So, we can say
monopoly doesn’t produce efficient level of output as they produce where P> MC=MR. So, in
monopoly price doesn’t reflect the true cost of production. Consumer wouldn’t be able to
buy efficiently.
Deadweight loss measure the loss in the efficiency caused when monopoly produces output
less than the efficient level of output. Monopoly as compared to perfect competition
produces less and charges high prices. It is the area of triangle between the demand curve
and the MC curve. The loss is basically due to the fact that they charge prices which are
more than MC. So, in this case those consumers, who are ready to purchase output at more
than the MC but less than the actual price level, couldn’t purchase output. Thus monopoly
pricing power leads to the Deadweight loss. They actually by charging high reduces the size
of the total surplus, by keeping some potential consumer out of the market.
In figure 4, quantity produced and supplied is on the x axis and price and cost on the y axis.
Pm, Pc and Qm and Qc represent the prices charged and the quantity produced by monopoly
and competitive firm respectively. We can observe that Pm>Pc and Qm<Qc, which shows that a
monopoly firm charges more and supply less quantity to the market. Downward sloping MR
curve is the marginal revenue curve of the monopoly. MC is the marginal cost curve and DD
is the demand curve. So, where MC curve intersect demand curve (AR=P) we get
competitive market equilibrium price and quantity. Now, where MC intersect MR curve from
below that determines the equilibrium level of price and quantity produced by monopoly.
The triangle ABC represents the deadweight loss of the monopoly firm.

Public Policies towards Monopolies

Monopolies lead to inefficient allocation of resources i.e. monopoly produces less than
socially desirable and charges more than MC. Policymakers recommends some policies
which help in reducing the level of inefficiency cause by the monopoly. They do following:
a) Antitrust laws: by creating Antitrust laws to curb Monopoly power. Antitrust

laws are those laws that are used by the policymaker in the government to promote more
competition and restrict market power. Famous Antitrust laws are Sherman Act of 1890;
Clayton Act and Federal trade Commission Act, in which competition was used as a tool to
control monopoly power rather than regulation or public ownership.
b) Regulation: by regulation we mean regulating the behaviour of monopolists. It

is very commonly used in public utility companies. Government agencies regulate their
prices. Usually, natural monopolies are found in the public utility companies. What price
needs to be charged by natural monopolies is a very tedious question? If price is set equal
to MC then monopoly will end up with losses. Natural monopolies have falling AC curve
because it involves huge fixed cost and negligible MC, which imply that if P=MC, then at
that price, MC<AC. Monopoly have to face huge loses at price P=AC, so they will prefer to
exit in the long run. Secondly, MC pricing will give monopolist no incentive to reduce cost.
Reduced cost mean higher profits but if with reduced cost regulators reduce the prices also,
then monopolist receive no benefits. In figure 5, MC pricing for a natural monopoly is
shown.

Now to get rid of these problem policymakers can do two things to solve this problem.
Either they subsidize the monopolist or they can let monopolist to charge more than MC say
AC pricing. Both of the condition will lead to Deadweight loss (DWL). In both the pricing
mechanism inefficiencies will arise. So, in such a condition it would be better to let
monopoly keep some positive benefit from lowering cost which acts as an incentive but for
this prices must be charged more than MC.
c) Public Ownership: Government sometimes also use public ownership to

reduce inefficiency cause by monopoly. In this way they stop regulating the private
companies and make their own monopolies. Although, economist prefer private monopolies
to government owned monopolies. This is so because private ownership has an incentive to
reduce cost if they are allowed to keep a part whereas in Government run monopoly usually
bureaucracy and red tapism is found.
So, each of the policies have their own drawbacks so policymakers faces the trade-off
between the monopoly’s problem and its solutions.
Price Discrimination
In perfect competition there are many firm selling similar products, so when any firm try to
charge price more than the market price, they will lose almost all the customers. In a
monopoly market, there is only one firm selling a given product. When they raises it price it
does not loses only some but not all its customers.

The monopolist usually tries to sell different units of output at different prices of a same
commodity and this practice of monopolist is what is known as price discrimination. Price
discrimination is not possible in competitive market because if any firm charge any price
other than market price either they lose its entire customer to others or not be producing at
efficient level. There are three types of price discrimination:-
First degree price discrimination: This happens when monopolist sell different
unit of output to different people at different prices. Basically, a monopolist here charges
each and every consumer price equal to their maximum willingness to pay or at their
reservation prices. Thus in this type of price discrimination, consumer surplus is zero and
producer surplus is equal to the total surplus. This is also known as perfect price
discrimination. All the gains from trade have been exhausted and there is no deadweight
loss to such a monopoly. The best example could be of a doctor charging different patients
different fees depending on the financial condition of the patients who are assumed to be
living in the neighbourhood and he is familiar with their financial condition. In figure 6, we
are comparing monopoly with monopolist with perfectly price discrimination.
Second degree price discrimination: Here monopoly sell different units of

output at different prices but every consumer receiving the same amount of good are
paying the same price. In other words, price for the entire good purchased depends only on
the amount of good purchased. Here monopolist provides customers different bundles of
good at different prices and let consumer self-select their preferred bundle of goods. The
example of second degree price discrimination can be different airfare for different class.
Different services are offered to customer belonging to different class in the flights which
help customer to self-select their class while travelling.

Third degree price discrimination: Here consumers get different unit of output
at different prices but all those purchasing same amount of output have to pay the same
price. Customers with different elasticity of demand are charged differently. Customer with
low elasticity of demand is charged less whereas customer with high elasticity of demand is
charged more. In this way a monopoly will maximize their profits. The best example of this
might be discount to students on laptops.
Comparing Monopoly and Perfect Competition

Perfect Competition Monopoly
Number of firms Many One
Entry and Exit Free Restricted
Close Substitute Many close substitutes No close substitutes
Profit maximization P=MR=MC P>MR=MC

condition
Economic profits They earn no economic they earn positive

profit economic profits in the long
run
Dead weight loss No DWL occurs DWL occurs

(Inefficiency)
Price Discrimination It’s not possible in perfect It’s one of the behaviour of
competition monopoly. So Possible.

Conclusion
In this chapter we discuss monopoly market. We learn that monopoly arises because of the
barriers to entry. How market behaves differently in case of perfectly competitive market
and monopoly market? We get the answer that profit maximization condition for the
monopoly is where MR=MC and MC should cut MR from below. In Monopoly,
P> MR= MC. Why this is so? We also learn that monopoly produces an inefficient level of
output and thus causes the deadweight loss. How policymakers can alleviate the problem of
monopoly using antitrust laws, regulating and by public ownership? Through price
discrimination monopolist can themselves eliminate DWL or at least reduce it.
The fact is that the real world exists between the two extreme, perfect competitions and the
monopoly. Different degree of price discrimination is actually practiced in the real world.
Summary
1) Monopoly is a firm that is the only player in the market. It arises because of barriers
to entry. Barriers to entry can be because of any of the following:
a) Monopolies Created by Government
b) Monopoly Resources
C) Patents and copyright laws
d) Natural Monopolies
2) Monopoly vs Perfect competition

Perfect competitive firms are price taker whereas monopoly is the price maker. A
competitive firm have a horizontal demand curve fixed at the price they charge and
monopoly faces a downward sloping demand curve. P=MR in case of perfect
competition and P>MR in case of monopoly which is the reason of inefficiency in the
monopoly market. Profit maximization condition is although same in both types of
firms. In case of monopoly if MR>MC, they will produce more and in case if MR<MC
they will produce less.
3) Social cost of Monopoly

Monopoly leads to deadweight loss by producing less than efficient scale of
production. Deadweight loss is due to the fact that by charging price more than MR or
MC they discourage those consumer who are willing to spend in between the price
charged and MC. Total surplus get reduced.
4) Public Policies towards monopoly: Antitrust laws, Regulation and Public Ownership.
5) Monopoly Behaviour describing Price Discrimination: The art of selling different goods
at different prices to the consumer is called price discrimination. There are three
types of price discrimination. Examples of it are Movie tickets, Discount Coupons,
Airfare pricing etc…

Questions for Review

1) How monopoly leads to deadweight loss?
2) What is the reason behind the inefficiency caused by a monopoly?
3) Why monopoly don’t have supply curve? How does monopoly determine the output
produced?
4) What are the different public policies to handle monopoly behaviour? Are they
efficient?
5) Why do firm price discriminate? What are the types of price discrimination and give
their examples?

1) An industry with a single firm that produces output for which there are no close
substitutes.
a) Perfectly Competitive industry

b) Pure Monopoly
c) An imperfectly competitive industry
d) Government Ownership
2) The shape of monopoly demand curve is
a) Upward sloping
b) Horizontal to the x-axis
c) Vertical to the y-axis
d) Downward sloping
3) Which of the following is true of a pure monopoly?
a. A pure monopoly always charges the highest possible price.

b. Producers in a pure monopoly enjoy complete freedom of entry and exit from
the market.
c. The pure monopoly’s demand curve and the market demand curve are one and
the same.
d. The main concern of a pure monopoly is not profit maximization but
preservation of its monopoly status.
4) The relationship between price and marginal revenue of a monopoly firm is
a) Price = Marginal Revenue

b) Price > Marginal Revenue

c) Price < Marginal Revenue

d) no relation exist
5) In which of the following industries consumer surplus is zero?
a) Second degree price discriminating

b) Pure monopoly
C) Perfectly Price discriminating industry
d) Imperfectly competitive market
6) In order to increase the amount of output sold, a monopoly must:
a) Increase the price of the last unit sold but maintain the price of all previous
units sold constant.
b) Decrease the price of the last unit sold but maintains the price of all other units
sold constant.
c) Decrease the price on all units sold.
d) Increase the price of all units sold.

1 b
2 d
3 c
4 b
5 c
6 C

Answer 1. Pure monopoly is a firm which is a single seller and has no close substitute for
the good that they produce.
Answer 2. A monopoly has a downward sloping demand curve. As to increase the supply of
a good they have to fall the price.
Answer 3. The pure monopoly’s demand curve and the market demand curve are one and
the same. As they are the only seller in the market, so there demand curve is same as
market demand curve.
Answer 4. For a monopoly firm, price need to be more than marginal revenue. As selling an
extra unit of quantity leads to raise in TR but by less than how much it got increased by
selling the first unit. Monopolists have to reduce the prices for selling an extra unit that to
not only on extra unit sold but also on all previously selling units which will result in the fall
of TR.

Answer 5. Perfectly price discriminating monopoly charges every consumer price equal to
their reservation price or their maximum willingness to pay which make their consumer’s
surplus equal to zero.
Answer 6. Monopolists have to reduce the prices for selling an extra unit that to not only on
extra unit sold but also on all previously selling units which will result in the fall of TR.

Answer 1. Option a) is incorrect, perfectly competitive industry have lots of firm selling
homogenous goods and have lots of substitutes. Option c) is wrong as imperfectly
competitive industry as they have more than one firm. Option d) is also incorrect as
government ownership do have close substitute.
Answer 2. Option a), b) and c) are incorrect, as for monopoly to sell more they have to
reduce price so its demand curve need to be downward sloping only. Thus cannot be
upward sloping, horizontal or vertical.
Answer 3. Option a) is incorrect because the price a monopoly charges is constrained by the
willingness of consumers to pay for the good it produces. Option b) is wrong as there is no
entry or exit in purely monopolistic markets. Option d) is also incorrect because pure
monopolies do not worry about potential entrants because entry is impossible.
Answer 4. There exist a definite relation between price and marginal revenue. So, option d)
is incorrect. Price if, equal to marginal revenue, that means that more quantity can be sold
without reducing the price which is not the case with monopoly. So option a) and option c)
is also wrong.
Answer 5. Consumer surplus is zero in case of perfectly price discriminating monopoly as

they charge price equal to their marginal willingness to pay. In all others cases CS is
positive or negative.
Answer 6. Option a) is incorrect as the law of demand states that quantity demanded
decreases as price increases. Option b) is incorrect as the price of all units of output sold must
decrease. Option d) is also incorrect as the law of demand states that quantity demanded
decreases as price increases.
Reference
1) Mankiw N.G. “Principle of Economics” 4 th Ed, pg 240–267.
2) Varian Hal.R. “Intermediate Microeconomics” 7th Ed, pg 445–454.

The Markets for the Factors of Production
Course: Introductory Microeconomics
Unit V- Input Markets
Lesson: The Market for the Factors of Production

Table of Contents
Learning Outcomes
Concept of Factors of Production
The Demand for Labour
The Production Function and the Marginal Productivity of Labour
The Value of Marginal Product and The demand for labour
What causes the Labour Demand curve to shift?
The Supply of labour
What causes the Labour Supply curve to shift?
Equilibrium in the labour market
Shifts in labour demand and labour supply
The other Factors of Production
Conclusion
Summary
Exercises
References

Learning Outcome
In this chapter we will learn about how equilibrium is determined in the factors market. We
will basically try to look economics from supply side. What are the factors of Production?
How a competitive firm decides how much of the factor to buy? What causes there demand
and supply to change, especially in the labour market? Why equilibrium wages equal to the
marginal product of labour? How factors of production are paid?
The Factors of Production

The inputs that are used for the production of goods and services are known as factors of
production. Labour, Land and Capital are the most important factors of production in an
economy. Entrepreneurship is the fourth important factor of production. It is the skills of
starting a new business or running a business more effectively. The demands for these
inputs are not direct i.e. to say their demand depends on the demand for the goods and
services. Therefore, their demand is also known as derived demand.
The Demand For labour

Labour is considered as the most important factor of production among all. Like, other
markets in an economy, labour market equilibrium are determined by the forces of demand
and supply. Labours are demanded by the firms who are engaged in the production of goods
and services and are supplied by the individual households. In exchange of their labour
services, labours are being paid wages.
In figure 1, we explain how the prices of the goods are determined by the interaction of
demand and supply curve of the goods and similarly how the wages are determined in the
input market by the interaction of labour supply and demand.
There are two assumptions that we keep in our mind about our firm. Firstly, that they are
competitive both in output and input markets. Secondly, that the firm is profit maximizing.
Figure 1 : The supply & demand in the goods & input market

The Production Function and the Marginal Product of labour

A firm decision for labour demand depend on how the size of the labour force and output
produced are related. In other words, the labour demand and the production function are
related. In table 1, we try to see how the competitive firm decides about how much labour
to demand. First and second column of the table shows the number of labour employed and
the corresponding quantity produced. Third column is Marginal Product of Labour. Marginal
Productivity of labour is the change in the output level from an additional unit of labour. In
table 1 for instance, when the number of labour is increased from say 1 to 2, the quantity
rises from 10 to 18. Thus MPL is 8 and when labour is increased from 2 to 3, the quantity
rises from 18 to 24 units. Thus MPL is 6 units only. Thus MPL is declining.
How the Competitive firm decides how much labour to demand
Labour Quantity Marginal Value of Wage Marginal

Product of Marginal Profit
Labour Product of
Labour
L Q MPL= ∆Q/∆L VMPL=P x W ∆Profit=

(no. of MPL VMPL-W
workers) (Units) (Units)
0 0 - - - -
1 10 10 100 50 50
2 18 8 80 50 30
3 24 6 60 50 10

4 28 4 40 50 -10
5 30 2 20 50 -30
Figure 2 graphs the production function. The labour employed on the x-axis and the
quantity produced on the y-axis. The production function is the way of transforming inputs
into outputs using some techniques. As labour input increases, the MPL decreases. That is,
as more and more labour increases, each additional worker contributes less to the
production of output. For this reason, production function becomes flatter as number of
labour increases.
Figure 2 : The production function
The Value of the MPL and the demand for labour
In order to decide how much labour to hire, firm has to decide about how much he is going
to contribute to the firm’s revenue. Firm is only concerned about the profit. Profit is
basically total revenue minus total wages offered. Now there is a need to convert labour
contribution to some value. Value of the MPL of any input is the product of the market price
of the output and the MPL. In table 1 we assume that the price of one unit of output is $ 10
in a competitive market. This is also known as Marginal revenue product. In order to know
how many labour will firm demand at some particular wage. Let’s say Wage is equal to $ 50
in table 1. So, for the firm it makes sense to demand labour to the extent where the value
created by them must be equal to the wage rate. In our example as third worker produces
$60 which is more than and fourth labour produces $40 only which is less than the offered
wage rate. Thus, a competitive, profit maximizing firm will demand only 3 labours i.e. up to
the point where VMPL= P x MPL.
Figure 3 : Value of Marginal Product of Labour

We graph the VMPL in the figure 3. It is a downward sloping curve because the MPL
diminishes with the rise in the labour force. As we are taking market wage as given, so
there will be a horizontal line at that wage rate. So, wherever these two lines intersect, the
firm will decide to demand that much labour. VMPL= W.
What causes The Labour demand curve to shifts?

We know that labour demand curve reflects the VMPL. But what make the labour demand
curve shift? Any factor that either change MPL or price will shift the labour demand curve.
The following are the reasons:
a) the output price: When output price changes it leads to change the value of
marginal product and thus shifts the labour demand curve. As if price rises, the firm will
demand more labour or vice- versa.
b) Technological change: Technological change shifts the production function.

It raises the productivity of labourers making them more productive and thus changes the
MPL. It may leads to the situation of more labour demand or less labour demand depending
on what sought of technological change it is. It may be labour augmenting or labour saving
technological change.
c) The supply of other Factors: The change in the supply of other factors
affects the MP of the other factor. For instance, if on a given piece of land, we keep on
increasing labour to work their efficiency will decline.
The Supply of Labour
The supply of labour is the decision of the households. It is also a function of wage rate

offered in the competitive market. There is always a trade - off between leisure and work.
Supply of labour shows the decision of the households supplying labour about their labour-
leisure decision with respect to the wage rate. If we spend more hours working, less time
will be available for leisure activities. This is the reason behind the labour supply curve. In
order to enjoy one hour of leisure, we need to forgo one hour of wage. That’s why wage is
also the opportunity cost of leisure.
Labour supply curve is upward sloping because higher the wage rate, more a household is
ready to supply his labour services. As the wage rate rises with it opportunity cost of leisure
also rises. When the wages increases, substitution effect encouraged the worker to work
more and earn higher wages and income effect makes the worker to work less and enjoy
more leisure by using the goods and services. Although, at a very high wage rate labour
supply curve is backward bending reflecting that at a very high wage rate, household would
decide to enjoy leisure more than working. At this point his income effect is more than
substitution effect. As both of the work and leisure are a normal good. So, income effect
makes him enjoy leisure along with the work whereas substitution effect makes him work
more.
What Causes the Labour supply curve to shift?

The labour supply curve shifts whenever household desire to work changes at a given wage
rate. Factors affecting such decision are:
a) Changes in Tastes: Changing in the attitude towards work by households

shifts the labour supply curve. For instance factor like rise in the number of female in
the labour force, Change in the retirement age of already in the jobs workers,
nuclear family living etc…
b) Changes in Alternative Opportunities: Shifting of the labour between

different industries due to any reason raises labour supply in some industries and fall
in other. Thus shifting the labour supply curve. For instance, labour leaving food
industry and moving towards clothe industry. This will shift up labour supply curve of
clothe industry and shift down in the food industry.
c) Immigration: It is the movement of labour from one place to another; it can be

from state to state or country to country. This is one of the most important factors
shifting the labour supply.
Equilibrium in the Labour market

Wages are determined in the competitive labour market by using two facts. First, by
adjusting labour demand to its supply and second, wage is equal to the value of marginal
product of labour. In figure 4 wages and quantity of labour employed are determined by the
interaction of labour demand and supply curve. At an equilibrium point, workers receive the
value of their contribution to the production of goods and services and firm will employ that

much labour that is profitable for them to do at equilibrium wage, i.e. hire until VMPL=W.
Equilibrium wage and employment changes with the change in the supply and demand of
labour.
Figure 4 : Equilibrium in the labour market
Shift in the labour supply and demand

Consider that there is a rise in the female labour force participation. That will shift the
labour supply curve to the right. This will lead to fall in the equilibrium wages, because only
at reduced wages firm will be willing to hire more labour. So, employment rises. Reduced
wages due to the rise in labour force also reflects diminishing marginal product of labour
(MPL).
In figure 5, as the labour supply changes due to higher female labour force participation the
labour supply curve shifts from SS0 to SS1 . At the initial wage W0, the quantity of labour
supply is more than the quantity of labour demand. So, excess labour will put pressure on
the wage rate and firm will be willing to hire more labour only at lower wages and thus the
wages fall from Wo to W1 and as the number of labour increases, MPL falls, and so does
value of marginal product of labour (VMPL).
Figure 5 : Shift in labour supply

Now consider that there is a change in the technology. Labour augmenting technology is
being introduced. This will lead to rise in the labour demand. So when the labour demand
increases, the equilibrium wage also rises along with the equilibrium employment. Here
value of marginal product of labour rises because of the labour augmenting technology
which in turn raises the MPL of the labour. So, as the VMPL rises firms wish to hire more by
offering high wage rate as it is profitable now to hire more labours.
In figure 6, as the labour demand rises due to the change in technology, demand curve
shifts from DD0 to DD1. At a given wage rate W0, demand for labour is more than the supply
of labour. Now in order to induce labour firms have to offer them the higher wage rate as
they will be willing to join only at the higher wage rate W1. This rise in wage reflects rise in
the MPL of the labourers which will raise the VMPL.
Labour demand and labour supply together determines the equilibrium wage and
equilibrium employment. Any shift in labour demand and/or labour supply cause the change
in the equilibrium level of employment and wage. At the same time, profit maximization by
the firms that demand labour ensures that the equilibrium wage always equal the VMPL.
Figure 6 : A shift in labour demand
The Other Factors of Production: Land and Capital

For producing goods and services firms needs factors of production other than labour also.
We learned how firm decide about how much of labour to demand and what wage to offer.
We need to decide about how much land is required to make the production goes on. In
order to understand capital in the context of factors of production we need to understand
that here we are referring to physical capital i.e. stock of equipment and structures used for

production. Capital means the accumulated goods produced in the past that are being used
in the production of new goods and services.
Equilibrium in the Markets for Land and Capital

How much capital and land are purchased or hire at equilibrium and how much capital and
land owner will get in return for their services? Before we answer this question we must
understand the difference between purchase price and rental price. Purchase price of any
factor is the price of owning it forever. Rental price, on the other hand, is the price that is
paid for a limited period use.
In the labour market, wage is basically the rental price of the use of labour services which is
determined by the forces of demand and supply. Similarly, in the land and capital market
their respective rental prices are determined by the forces of demand and supply.
Landowners earn rent and capital owners earn profit. Rental prices and purchase prices are
obviously related in the sense that buyers are willing to pay more if that factor produces a
valuable stream of rental incomes. Current value of MP and expected value of future MP of
factors determines the equilibrium purchase price of a factor.
In figure 7, a) rental price of land and in b) rental price of capital is determined by forces of
demand and supply. In the above figures we can observe that land supply curve is relatively
inelastic which point to the fixed quantity of land at least in the short run. Regarding how
much is employed; same procedure is used by the firm. The firms will keep on hiring till the
value of marginal product of the factor is more than it rental price. Thus, the demand curve
for each factor reflects the MP of that factor.
Now we have the perfect idea of how much each of the factors is paid for their
services. Each of the factor, land, labour and capital earns equal to the value of their
marginal product whenever our assumptions are true.
Figure 7 : Markets for land & capital

Linkages among the factors of Production
As all the factors of production are paid equal to the value of marginal product of that
factor. MP in turns depends upon the quantity of that factor available corresponding to other
factors. As when any factor keep on increasing with other factor remain constant or not
raising that much, that very factor faces diminishing MP. A factor in abundance supply has a
lower MP and thus a lower price, and a scarce factor has a high MP and thus a high prices.
So, whenever some factor supply falls, its price rises.
Production of any good usually depends on some combinations of the factors of production.
Factors of production are used together in the production process. So, in case there is a
change in any factor of production supply, it affects other factor marginal products (MP).
Thus a change in the supply of any factor leads to change in the earning of all the other
factors.
For example think of an industry where demand of labour depends on number of capital
units available. In case, there is an increase in the capital in the industry which means
relative abundance of capital and at the same time relative scarcity of the labour that will
lead to fall in the MP of the capital and raise the MP of the labours. Thus there will be fall in
the rental price of capital and rise in the wage rate.
Conclusion
Here in this chapter we learn about how factors of production are being paid in the
production process. Each and every factor quantity employed is determined by the forces of
demand and supply. The demand, in turn depends on the marginal productivity. At
equilibrium, each factor of production earns the value of its marginal productivity. Change in
demand and/or supply of any factor of production will change the equilibrium level of factor
payment of its own as well as of payments of other factors.
Summary
1) Factors of production are the inputs that are used in the production of goods and
services.
2) Labour market is determined by the interaction of demand and supply. The demand
for labour is determined by their marginal productivity. Labour are paid wages in
return of their service. As the labour supply increases, MPL diminishes.
Social cost of Monopoly. In a competitive market, profit maximizing firm will demand
labours up to the point where VMPL= P x MPL.
3) The Labour demand curve to shifts due to the change in any of the following reasons:
a) the output price

b) Technological change
c) The supply of other Factors
4) Supply of labour decision is taken by the household. Household have to take make a
trade - off between leisure and work. Labour supply curve is although upward sloping
up to some wage level but after certain wage level it is backward bending.
5) The Labour supply curve to shifts due to the change in any of the following reasons:
a) Changes in Tastes
b) Changes in Alternative Opportunities
c) Immigration
6) Equilibrium in the labour market is determined by adjusting labour demand to its

supply and wage is equal to the value of marginal product of labour at an equilibrium
point i.e. workers receive the value of their contribution to the production of goods
and services and firm will employ that much labour that is profitable for them to do at
equilibrium wage, i.e. hire until VMPL=W. Equilibrium changes with the change in
demand and supply of any factor.
7) Other factors market equilibrium is also attained at the point where the value of their
marginal product is equal to its rental price. There are factors of production which are
also used along with labour in the production of goods and services.
Questions for Review

1) Why the demand for input is known as derived demand?
2) How the wages are determined in the labour market?
3) What are the factors that lead to shift in demand and supply curve of labour?
4) How does the equilibrium determined in the factor market?
5) How the changes in the one factor supply affect the return to the other factor’s
return?

1) The change in the output level from an additional unit of labour is known as
a) Average product of labour

b) Total product of labour
c) Marginal Product of labour
d) none of the above

2) The marginal product of labour _____ as number of labour increases
a) Falls
b) Rises
c) Remain constant
d) may rise or fall
3) At equilibrium, what is the profit maximizing condition for labour market?
a. VMPL > W.
b. VMPL < W.
c. MPL = W.
d. VMPL = W.
4) When does the labour supply curve bends backward?
a) When Income effect is less than substitution effect

b) When Income effect is more than substitution effect
c) When income effect is absent
d) When substitution effect is absent
5) When the supply of one factor increases, what impact does it have on the rental
price of other factors in the production process?
a) Rental price of other factor increases

b) Rental price of other factor decreases
C) It has no impact on the rental price of other factor
d) there is no relation between the supply of one factor and the rental price of the
other factor

1 c
2 a
3 d
4 b
5 a


Answer 1. Marginal product of labour is the addition to the output that is contributed by the
additional unit of labour.
Answer 2. As labour input rises, the MPL falls. That is, as more and more labour increases,
each additional worker contributes less and less to the production of output.
Answer 3. At an equilibrium point, workers receive the value of their contribution to the
production of goods and services and firm will employ that much labour that is profitable for
them to do at equilibrium wage, i.e. hire until VMPL=W.
Answer 4. at a very high wage rate labour supply curve is backward bending reflecting that
at a very high wage rate, household would decide to enjoy leisure more than working. At
this point his income effect is more than substitution effect.
Answer 5. Production of any good usually depends on some combinations of the factors of
production. So, in case there is an increase in any factor of production supply, it increases
the return to other factors.

Answer 1. Option a) is incorrect, Average product is the not the addition but the average
but each labour contributes. Option b) is wrong because total product of labour is the total
what all the labour contributes. Option d) it’s not the case as the definition above defines
with Marginal product of labour.
Answer 2. Option b), c) and d) are incorrect, as more and more labour increases, each
additional worker contributes less and less to the production of output. So MP falls.
Answer 3. A firm would like to employ more labour till the point where the marginal
contribution at market price is equal to the wages. That implies equilibrium profit
maximizing point will be at a point where VMPL is exactly equal to W. So, a), b) and c) are
wrong.
Answer 4. There exist a definite relation between price and marginal revenue. So, option d)
is incorrect. Price if, equal to marginal revenue, that means that more quantity can be sold
without reducing the price which is not the case with monopoly. So option a) and option c)
is also wrong.
Answer 5. Option b), c), d) is wrong because the rise in the supply of one factor will lead
the marginal product of that factor to fall. And as there is a direct linked between the rental
price and the productivity. So, the rental price of the other factor will increase because of its
scarcity in relative terms.

Glossary
Derived demand: demand arises for factors because the demand for the goods and
services for which they are being used has risen.
Income Effect for wages: With the rise in wage rate, the worker spends more
time in leisure and less time at work. Leisure becomes more attractive at higher wage rate.
Marginal Productivity of labour: defined as an additional amount of output

produced by one extra unit of labour.
Substitution Effect for wages: With the rise in the wage rate, the opportunity
cost of leisure rises, which makes worker to spend more hour at working and less time to
enjoy leisure.
Value of marginal product of factor: It is the product of the market price of

good and the MP of the factor. It is the value to the firm for hiring one unit of the factor.
Reference
1) Mankiw N.G. “Principle of Economics” 4 th Ed, pg 334-348.

Introduction to Macroeconomics
Lesson: Introduction to Macroeconomics

Lesson Developer: Dipavali Debroy
College/Department: SGGSCC, University of Delhi

Table of Contents
1.Learning Outcomes
2.Introduction
4. Positive Economics and Normative Economics - Methodology
5. Art or Science
8. Macroeconomic Variables
9. Laws of Economics
10. Market, Equilibrium, Demand, Supply
11. Markets in Macro-economics
12. Concept of Aggregate Demand and Supply
13. Closed Economy and Open Economy
14. Partial and General Equilibrium Analysis
18. Summary
19. Exercises
20. Glossary
21. References
22. Activity
1.Learning Outcomes
After you have read this chapter you should be able to define Micro-
Economics, Macro-Economics, Market, Demand, Supply, Equilibrium, Partial
and General Equilibrium, Static and Dynamic Equilibrium, Long Run and
Short Run, understand the central problems of an economy, identify
variables, constants and parameters, real and nominal variables,
differentiate Micro-Economics from Macro-Economics, the scope of the
subject of Economics, apply the knowledge of basic Economics
Value Addition:
Focus of the Section
Topic Economics
This section is to make you aware of what Economics is.
The purpose of this section is to make you familiar with the various
Definitions of Economics, the Evolution of the subject, its Scope,
Methodology, Tools and Basic Concepts.
2.Introduction
Macro-Economics is the branch of Economics that studies economic issues in

aggregative and overall forms, looking at the broad picture. In contrast, Micro-
Economics is the branch of Economics that studies economic issues in minute and
individual details , as if under a microscope.

The word Macro and Micro come from the Greek words macros ( long or huge) and
micros (small).
As for Economics, there are two basic definitions.
Economics
According to the famous economist Alfred Marshall, Economics is the study of

human beings as they go about their everyday life.
To quote from Marshall’s Principles of Economics (1890, "a study of mankind in the
ordinary business of life; it (Economics) examines that part of individual and social
action which is most closely connected with the attainment and with the use of the
material requisites of wellbeing. Thus it is on one side a study of wealth; and on the
other, and more important side, a part of the study of man."
Lionel Robbins has drawn our attention to another aspect and defined Economics as
the study of choice under conditions of scarcity.
"Economics is a science which studies human behavior as a relationship between
ends and scarce means which have alternative uses."
Productive Resources ( land, labour, capital goods such as machinery, technical
knowledge) are scarce or limited and the resource applied to the production of a
certain commodity or service is unavailable for the production of another alternative
one. But human wants for the Consumption of goods and services ( cereals and
pulses, meat and fish and poultry, vegetable, clothes, woolens, houses, roads, cars,
railways, airplanes, books, theatre , film, television and countless others) are
unlimited, and come from numerous members of the society .Economics is the study
of how people can choose to use the scarce or limited resources to produce various
good and services and distribute them to various members of society for their
consumption.
Any society faces three fundamental and interdependent economic problems:
1. What to Produce and How Much of them
2. How to Produce, that is, by whom and by what resources and technology
3.For Whom to Produce, that is, how is the total amount of production in the
society to be distributed among its members.
Economics helps us in analyzing and understanding these problems.
Etymologically, the word Economics derives from the Greek word oikos ( house) and
nomos ( management).
But since the second half of the 17th century, the word Economics has come to be
used in the wider context of a whole country or nation rather than the household.
Adam Smith is known as the `father’ of the subject of Economics. His book An
Inquiry into the Nature and Causes of the Wealth of Nations, first published in 1776,
is the first-ever treatise on Economics . Smith’s concern was about nations or
countries, that is, it was a Macro-type concern, although the term Macro was not in
use then.
Later T.R. Malthus, David Ricardo , and J.S.Mill wrote important treatises on the
subject, taking the same overall perspective and sweeping generalizations taking
long-run perspectives. They are known as Classical economists and have also been
described as Magnificent Economists because they dealt with big issues on a broad
background. One of the tenets of Classical economists was that in the long run there
is no unemployment in the economy. Jean-Baptiste Say (1757—1832), a French
economist, stated that “products are paid with products” which came to be popularly
interpreted as ‘Supply creates its own Demand.” Given sufficient time, imbalances in

the economy will be smoothed out and people, or governments, need not be worried
about them. This was the basic standpoint of the Classical economists and is known
as the “Say’s Law”.
While the Classical approach prevailed throughout the 19 th century, a Neo-
Classical approach came to be formed towards the end of the 19th century.
economists began to study economic issues on a more specific and individual level. It
concentrated on how the price and quantity of specific goods (and services) were
determined in the market though a rational balancing of their `marginal’ costs and
benefits (ùtilities and productivities).Foremost among these Neo-Classical
economists( also described as `marginalists’) were Menger, Jevons and Alfred
Marshall. It is their work that constitutes the foundation of Micro-economics, where
the individual consumer or producer was the unit concerned, not the entire national
entity.
Although the Classical economists had been concerned with the nation or the country
as a whole, and therefore are more Macro than Micro, in approach, Macro-economics
as a subject developed only after the Great Depression. On 23 October 1929, the
New York Stock Exchange ( at Wall Street) crashed. Many rich and successful people
lost their all and took their own lives in desperation. Widespread unemployment
followed the closing down of production units. Both employers and employees felt
the impact. Not just America or Europe but their colonies too suffered. It was a
global crisis.
It was then that John Maynard Keynes came up with his analysis of the
phenomenon in terms of Aggregate Demand falling short of Aggregate Supply and
emphasized the role of the Government of a country in stepping up its own
expenditure in order to correct that shortfall or gap.
His analysis laid the foundation of Macro-Economics. Later John Hicks, Milton
Friedman, James Tobin, A.W.Phillips, Edmund Phelps, Robert Lucas, T.J. Sargent,
Robert Barros and others have contributed to the subject of Macro-Economics,
bringing in the roles of Money and Expectations. Lucas, Sargent and Barros are often
called the New Keynesian economists.
To sum up in the words of Paul A. Samuelson, “Macroeconomics deals with the big
picture – with the macro aggregates of income, employment, and price levels. But do
not think that microeconomics deals with unimportant details. After all, the big
picture is made up of its parts.” ( Economics, 7th edn, p 362). So he concludes that
there is no essential opposition between the two.
Traditionally and in most universities, a course in Micro-Economics is taught prior to
one in Macro-Economics.
4. Positive Economics and Normative Economics - Methodology
According to economists like Milton Fieldman ( who wrote Essays in Positive

Economics, 1953), economists should not pass moral strictures or make `value
judgements’. In other words, Economics should just `posit’ or be Positive, and not
set any norms of behaviour to be followed by individuals or organizations.
Economics can provide policy prescriptions, but expressed in an objective way.
It would be a normative statement to say: ‘If there is economic depression, the
government of the country should increase its consumption expenditure’.
But it is permissible to make it in the following positive statement: ‘If there is
economic depression and the government increases its consumption expenditure, the
depression is likely to get corrected’.
5. Art or Science ?

Is Economics a science or an art? Etymologically, a science ( derived from sci, to

know) provides theoretical knowledge while an art ( derived from artem, to do)
teaches us how to practice or do it. Now Economics teaches us all about, say, why
there may be unemployment in the economy.. But it does not teach him how to
generate employment From this point of view, it is a science rather than an art.
Again, the recent theoretical developments in Economics have made so much use of
Mathematics, that a sound knowledge of Mathematics is essential even for its
undergraduate Honours course, e.g., in Delhi University itself. This takes Economics
closer to being a Science subject.
However, the hallmark of science is experiment. A science must provide room for
controlled experiment so as to verify its hypotheses. But human beings cannot be
subjected to experiments just to find out the effects of , say, fiscal or monetary
policies. In this sense. Economics definitely belongs to the Humanities stream.
Most universities, regard Economics as an art and award BA and MA degrees in it.
However The London School of Economics does, in fact, award BSc and MSc degrees
to its students of Economics.
Indeed the scope of Economics is so wide that it is difficult to categories it as either
science or art. It is perhaps a mixture of both.
As Paul Samuelson put it, “ Not only is Economics at once art and a science,
economics as a subject can combine the attractive features of both the humanities
and the sciences”(Economics, 7th edn, Chapter 1,p 4).
A Social Science
Even if we use the term science to describe Economics, we must remember that it is
a Social Science. It does not study individuals in isolation, doing everything by
oneself. It studies individuals as members of a society or nation or Economy.
An economy is the same as country or society but considered only in its economic
aspects. Every society or country has numerous people engaged in activities of all
sorts. Some work in the fields, some work in factories, and yet others in offices.
Some perform agricultural activities, some industrial, and some do services. Those
who are in agriculture need to get industrial products and, say, banking services.
Those who are factory-workers, say, need to get hold of foodstuff, and use some
kind of transport services. The people engaged in the services sector need both food
and clothing . Thus all the three sectors with their separate kinds of activities need to
have relations. All the people of an economy need to act as well as inter-act. This
they do by exchanging the products of their various activities in various markets.
The epithet `Social’ covers this aspect of the subject of Economics.
However, for analytical purposes, Economics sometimes uses the concept of a
Robinson Crusoe Economy, or an economy consisting of a single person performing
all the economic activities by himself. Robinson Crusoe is the title of a book written
in 1719 by Daniel Defoe based on the life of Alexander Selkirk who was marooned on
an island and survived all by himself for 28 years. A Robinson Crusoe Economy is
thus a theoretical concept where the economy has a singleton member.
Economics has a wide scope and has connections with various subjects.
Mathematics and Statistics are necessary for the study of Economics. Mathematics
helps economists to analyze economic realities, to and derive conclusions from
them. Statistics aids this process by systematizing the economic realities as data and
inferring from them by accepted statistical tools. In fact, the application of Statistics
to Economics had led to the development of a relatively new subject: Econometrics.

It helps in empirical study and making projections both into the past and the future.
Without a sound mathematical base, it is next to impossible to cope with academic
Economics. However, to have an general awareness of the economic occurrences of
the world, basic intelligence will do. To quote Samuelson, “ Although every
introductory textbook must contain geometrical diagrams, knowledge of
mathematics itself is needed only for the higher reaches of economic theory. Logical
reasoning is the key to success in the mastery of basic economic principles, and
shrewd weighing of empirical evidence is the key to success in mastery of economic
applications.”( Economics, Ch 1. p 5)
Actually, the earlier term for Economics was Political Economy. Several universities
still have a common department for Politics and Economics. Political Science is an
useful subject to supplement a course in Economics. History is also a subject that
has a close connection with Economics. Economic History is a compulsory paper in
every course in Economics, undergraduate as well as post-graduate. Several
universities offer a post-graduate course in Economic Geography.
In recent times several subjects or courses have emerged from Economics, e.g.,
Commerce, Business Economics, Business Administration, Business Management.
While based on the fundamentals of Economics, they have their own distinctive
course contents. But both Papers on Micro-Economics and Macro-Economics figure in
all of them.
Economics has to deal with a complex mass of realities. So it sometimes puts them
into a simplified framework or Model. A Model is a theoretical construct that
represents economic realities by a set of inter-related variables. These relationships
can be logical or quantitative. But putting them in a Model helps economists to
analyze realities better and even made future predictions.
Economist often posit or propose explanations for economic phenomena. These are
known as Hypotheses. A hypothesis is not a theory. Only if a Hypothesis is verified
or found to be true, can we call it a Theory. To be verified or falsified, that is tested,
a hypothesis has to be framed in a certain way. Such a hypotheses is called a
Scientific hypothesis. Sometimes economists have no alternative but to take a
certain hypothesis to be true, and proceed on the basis of it. Such a hypothesis is
called a Working hypothesis. Statistics and Econometrics are the tools used in
verifying a hypothesis.
Economics is a complex subject, rooted in the reality but often analyzed through
abstract thinking and mathematical methods.
As symbols of that reality, Economics makes use of the Mathematical concepts :
Variables, Constants and Parameters.
Variables are entities that take different values. They are usually symbolized by x, y ,
z. and take values positive and negative ranging from minus infinity to plus infinity.
Constants are entities that , for one particular analytical exercise, take one
particular value. They are usually symbolized by a, b, c .. or alpha, beta, gamma.
And again, can take any value between plus-minus infinity but can take only one
such value during a particular analysis.
Parameters are entities that can be assigned different values for different variants of
an exercise but in any one particular variant, can take only one such value.
Variables can be dependent or independent.
An Independent variable takes on values by itself.
A Dependent variable takes on values according to or as per the Independent
variable. This relation of dependence between the Independent and the Dependent
variable(s) is known as a functional relationship, or simply, a Function. It means that

the Dependent variable functions according to the Independent variable. It is a most

powerful tool in the sturdy of Economics, both Micro and Macro. E.g.,
C = f(Y) is the Consumption Function which says that consumption C depends upon
National Income Y.
in the way of analyzing it. This is done under an assumption known as the ceteris
paribus assumption.
8. Macroeconomic Variables
Important variables in Macro economics are National Income, Disposable Income,
Consumption, Saving etc. Sometimes the ratio of two variables may be regarded as
a variable in itself, e.g., Consumption/National Income is a separate variable, viz.,
the Average Propensity to Consume. Variables may be Real or Nominal.
Nominal variables are those expressed in terms of money, usually in terms of the
current prices. Real variables are those expressed in real terms, or constant prices,
which means that they are `deflated’ or corrected for possible fluctuations in the
price level. The Deflator used is usually the General Price Level.
Variables may be Stocks or Flows.
A Stock variable measures the quantity of the variable at a particular point of time.
E.g., the capital a businessman has got on such-and-such date. A Flow Variable
measures the quantity of a variable over a period of time. E.g., the investment he
has made in his business in that year or the profit he had made in course of it.
Some important Macro-economic variables are as follows:
National Income ( usually symbolized by Y)is the sum total of the money measures
of goods and services produced in a country during a year. It is also the Net
National Product at Factor cost, i.e., the sum total of income generated by an
economy during a year. It is a flow variable.
Gross Domestic Product (GDP) is the sum total of the money measures of the
goods and services produced during a year within the national boundaries of that
country.
Personal Income is that part of the National income which is actually received by
the persons or households of the economy. Corporate Income Taxes, Undistributed
Corporate Profits, Savings of Non-Departmental Enterprises, Income from Property
and Entrepreneurship of Government Administrative Departments and Social
Security contributions do not go to persons or households. They have to be deducted
from the National Income to get at its personal component.
On the other hand, Transfer Payments (pensions, unemployment doles), though they
are not earned income, add to the amount that the individuals and household have
to spend. They have to be added to the National Income so as to get at its personal
component.
Thus,
Personal Income = National Income - Corporate Income Taxes -Undistributed
Corporate Profits - Savings of Non-Departmental Enterprises - Income from Property

and Entrepreneurship of Government Administrative Departments - Social Security

contributions + Transfer Payments
Disposable Income(Yd) is the total of income that is actually available to the

individuals and/or households to be `disposed of’, i.e., spent on consumption and
the rest saved. Individuals and households have to pay Direct Taxes ( e.g., the
Income Tax).These have to be deducted from personal income in order to arrive at
the personal Income.
Disposable Income = Personal Income – Direct Taxes
Consumption (C ) refers to the aggregate amount of private consumption in the

economy. It includes Durable goods ( long-lasting goods like houses and vehicles),
Non-Durable goods (lasting for shorter durations , e.g., eatables, clothes ) and
Services (e.g., banking, travel, consultancy, entertainment).
Consumption is a function of National Income.
C=f(Y).
More specifically, Consumption is a function of Disposable Income.
C=f(Yd).
Savings ( S) refers to the aggregate amount of private saving in the economy, i.e.,
S = Y – C.
Investment (I) includes produced means of production that make an addition to

the country’s capital. While capital is a `stock’ variable ( i.e., a quantity measured at
a point of time), investment is a `flow’ variable’ ( a quantity measured over a period
of time).
Investment has three component: Business Fixed Investment( purchases made by
firms, of new plant, machinery and equipment), Residential Investment( purchases
of houses by householders and others) and Inventory Investment ( increase in
stocks held by firms of inputs as well as of their own outputs). Investment may be
autonomous or dependent upon the interest rate.
The interest rate is the rate at which loan able funds are leant out in the economy.
In any actual economy there are several rates of interest prevailing simultaneously.
But for theoretical purposes, we take it that uniform interest rate (r or i) prevails.
The Nominal Interest Rate is deflated by he Price level to get the Real interest Rate.
Government Expenditure(G) refers to all purchases made by all governmental bodies
in an economy, e.g., on provision of infrastructure, public transport, administration,
defence, space research. G is generally taken to be autonomous.
Net Exports (NX) refer to the value of goods produced in an country and exported
abroad (X) after the deduction of the value of goods and services produced abroad
but imported (M) by the country. That is, NX= X-M.
A most important equation in Macro-economics is:
Y= C+I+G+NX.
In Macro-economics prices are not taken as individual prices of individual

commodities (px,py etc). Instead, an Index Number of all prevalent prices is
constructed to represent the price level in general (usually symbolized as P).
When any Nominal Variable is divided by this price level P, one gets the Real
variable. Thus

Nominal Gross Domestic Product (GDP)

-------------------------------------------------- = Real GDP
General Price Level (P)
Or,
Nominal GDP
GDP Deflator = ---------------------------
Real GDP
The rate at which the General Price Level increases is known as the Inflation rate.
Thus
Pt – Pt-1
Inflation rate = ------------
Pt-1
where t refers to the present time-period and (t-1) the previous one.
Money (M) is an important variable in Macro-economic analysis. An old rhyme says:

Money is a matter of functions four:
A medium, a measure, a standard , a store.
Earlier Money was regarded only as a medium of exchange or transactions. Keynes
brought out how people can speculate with it as well as make transactions. The
definition of money is getting wider and wider. When Money supply is deflated by the
price level, the resulting ratio M/P is called Real Balances.
Another crucial variable in Macro-economics is the Unemployment rate. The

Labour force (L) of an economy is the total number of employed (E )and unemployed
(U) people in it. Then the Unemployment rate = U/L.
9. Laws of Economics
The Classical and Neoclassical economists often used the term `law’ to describe the
tendencies that they observed in functioning of the economy or society. The Law of
Demand and the law of Diminishing Returns in Micro-economics and Say’s Law , and
Okun’s law in Macro-economics are just a few examples. In no sense are these
binding or enforceable or universal laws.
However, law in the usual sense of the term does have a close connection with
Economics. It is a basic idea of neo-Classical Economics that , for the smoothing
functioning of the market, there must be law and order in the country. The law of the
land influences its economic performance.
10. Market, Equilibrium, Demand, Supply
The word Market comes from Latin mercatus which meant trading, buying or selling
at an appointed time or place. A market is not necessarily a marketplace. It is a
conjunction or coming-together of buyers and sellers. The haat, bazaar and mandi
, the shop and the mall are markets. But on line or telephonic sale and purchase ,
which is quite common these days, are also market transactions.
The distinguishing feature of the market is that market transactions are exchanges
, usually performed through the medium of money. The seller ( who is sometimes
though not always the producer) of certain commodities/ services brings them to the
market and offers certain quantities of quantities of them at a certain price . He

thus supplies them in the market. The (prospective) buyer comes to the market
wanting to get certain commodities/ services at a certain price. He thus demands
them in the market. If the demand of the buyer and the supply of the seller match at
a certain configuration of price and quantity, the transaction takes place. If not, it
does not.
The transaction is thus both a sale and a purchase. It is sale from the point of view
of the Seller(producer) , that is, from the Supply side. It is purchase from the point
of view of the Buyer, that is, the Demand side.
The transaction configuration is known as the Equilibrium.
In Latin, aequus means equal and libra means scales or balances.( That is why in the
Zodiac, the sign Libra is shown by a pair of scales). When the two scales on the two
sides of a scales instrument hang steady at the same level, there is aequilibrium, or,
in English, Equilibrium.
The word Demand is from Latin demandare which means to claim or commission.
Supply is from Latin supplere, to fill up or complete.
In the context of Economics it was Adam Smith in 1776 who first used them as
corresponding concepts. Marshall has compared them to the two blades of a pair of
scissors. Just as the scissors cannot work without either of the two blades, Market
Equilibrium cannot be determined without reference to both Demand and Supply.
11. Markets in Macro-economics
Markets in Macro-economics are distinct from those in Micro-economics in the sense

that they are markets for Aggregates, not for individual items being demanded and
supplied by individuals.
Broadly , three such markets are distinguished in Macro-economics:
Goods Market – where goods and services are bought and sold, not specific goods
but the Gross Domestic Product
Labour Market – where workers are employed and remunerated, not individual
workers but the Labour Force as a whole
Money market or Financial Market – where money and financial resources are
bought and sold.
The Labour Market is studied separately, but the Goods Market and the Money
Market are brought together so as to yield an overall equilibrium defined in terms of
the National Income and the Interest Rate.
In addition there is the foreign or International market when the economies

concerned are open and trading with each other.
In Macro-economics, the overall equilibrium of the economy
12. Concept of Aggregate Demand and Supply
Aggregate Demand and Aggregate Supply are two important concepts of Macro-
economics.
Aggregate Demand refers to the overall or national demand for goods and
services. It comes not from individual persons, households or even groups, but from
all the citizens taken together.
Similarly, Aggregate Supply refers to the overall or national supply of goods and
services , or the national income it generates.
When Aggregate Demand equals Aggregate Supply , there is equilibrium in the
Macro-economic sense, in the overall or national market for goods and services, or
simply, the Goods Market. When Aggregate Demand falls short of Aggregate Supply,

there is Depression. When Aggregate Supply falls short of Aggregate Demand, there
is Inflation. Inflation, Depression and Unemployment are fundamental concerns of
Macro-economics.
13. Closed Economy and Open Economy
Macro-economic Theory distinguishes between the Closed Economy and the Open
Economy.
A Closed Economy has no ( or negligible) interactions with the rest of the world. All
production, consumption and market exchange is internal and in the same domestic
currency. There are no Exports and Imports and Net Exports are zero. There is no
Foreign Investment in other countries or by foreign countries, and Net Foreign
Investment is zero as well. It is in this Closed Economy framework or model that the
Goods market and Money Market are analyzed to yield an overall equilibrium.
An Open Economy has transactions with the rest of the world. It exports and
imports, borrows and lends. It invests abroad and other countries invest in it. All this
involves the use of at least two, if not more, currencies.Net Exports, Net Foreign
Investment, and the Exchange Rate are thus important elements in Macro-
Economics.
14. Partial and General Equilibrium Analysis
in the way of analyzing it. Then what is done is to make an assumption known as the
ceteris paribus which means ‘other things being the same’. It qualifies or conditions
a causal relationship between an independent variable and the dependent variable
that depends on it or functions according to it. In Latin Ceteris means òther things
or the rest’ and Paribus means ` at par or equal’.
Partial Equilibrium Analysis is a study of economic occurrences where a causal
relationship is studied between two variables, keeping other related variables
constant or fixed under the assumption of ceteris paribus‘other things being the
same’ However it lets only one market (at a time) be in equilibrium and may not
capture the complexities of the real world. General Equilibrium Analysis lets the
inter-dependence of various variables play themselves out. Prices of Commodities
are determined simultaneously and mutually. All markets are simultaneously in
equilibrium. Macro-economics , as of now, uses the Partial Equilibrium analysis
rather then the General Equilibrium.
In a static equilibrium all quantities have unchanging values but in a dynamic

equilibrium various quantities may be growing , only their ratios being unchanged.
Comparative Statics compares two static cases of equilibrium. Comparative
Dynamics compares two dynamic equilibria.
Macro-economics uses both kinds of analysis. Foe example, it includes the study of
the Dynamic Multiplier and the Dynamic Aggregate demand Curve.
A run is a length of time, not exactly specified. If all factors of production can be
varied during a length of time, it is called the Long Run. If some variables can be

varied but others cannot, i.e., are fixed, it is the Short Run. A Short Run
Equilibrium, one that holds in the Short Run, is achieved in Macro-economics if
Aggregate Demand is equal to Aggregate Supply. But if there is a gap, there is dis-
equilibrium leading to unemployment, depression or inflation . The Classical
economists held that in the Long Run the dis-equilibrium situation will correct itself.
Wages and prices will adjust and this variable or flexible character will ensure
equilibrium in the Long Run.
It was precisely against this attitude that Keynes wrote: “The long run is a
misleading guide to current affairs. In the long run we are all dead. Economists set
themselves too easy, too useless a task if in tempestuous seasons they can only tell
us that when the storm I past the ocean is flat again” ( A Tract on Monetary Reform,
1923, Ch 3).
Wages are not so flexible in the Short Run, and this `sticky’ character of the wages
may stand in the way of restoring equilibrium in the Short Run. Keynesian analysis is
Short Run analysis.
However modern Macro-economics also includes
Study of Inflation and output in the long Run, using Dynamic Aggregate Demand
Curve and Dynamic Aggregate Supply Curve.
The highest recognition for economists is the “Sveriges Riksbank Prize in Economic
Sciences in Memory of Alfred Nobel” , first awarded in 1969. Among the important
Macro-economists who have received it are:
Robert Lucas in 1995, James Tobin in 1981, E.S.Phelps 2006, Friedman 1976 and
Robert A Mundell in 1999
18. Summary
 Economics studies human choice among alternative uses of scarce
resources.
 It is a Social Science has a wide scope. It aids the understanding of
the central problems of an economy.
 Demand and Supply of goods and services determine their
Equilibrium Price and Quantity in the Market.
 Aggregate Demand and Aggregate Supply determine macro-economic
Equilibrium.
 Markets can be of various types and forms.
 Equilibrium can be Partial and General, Long-Run and Short-Run,
Dynamic and Static.
19. Exercises
Short Questions
1. How would you define Macro-Economics?

2. Name some basic concerns of Macro-economics.
3. What does a function mean ?
Long Questions
1. What were the ideas and beliefs of the Classical economists?

2. What are the main ideas of Keynesian economists?

3. Explain the concept of Market and distinguish between different types of

Markets in an economy.
4. Define some important concepts of Macro-economics.
20. Glossary
Variables
Constants
Hypothesis
Model
Demand supply
Market
Equilibrium
Static Equilibrium
Dynamic Equilibrium
Long Run
Short run
General equilibrium
Partial Equilibrium
Consumption
National Income
Gross Domestic Product
Disposable income
Money
Consumption
Savings
Net Exports
Interest rate
21. References
1. Economics, Paul A Samuelson

2. Macroeconomics, N. Gregory Mankiw
22. Activity
From newspaper and official statistics, find out the National Income, Inflation Rate
and the Unemployment Rate for the previous two years.
Talk to some householders as well as factory workers about what they feel about
Inflation and Unemployment.
Quiz
1. What was the nationality of J.B.Say ? (English, French, Canadian)

2. Which economic event led to the development of Keynesian economics ? (
Bolshevik Revolution, Independence of India, Wall Street Crash)
3. What do Consumption and Savings together constitute? (National Income,
Unemployment, Investment)
4. What is Disposable Income is a component of ?( Exports, National Income,
Consumption)?


National Income Accounting
Semester-I
Paper I: Principales of Economics(POE)
Unit-III
Lesson: National Income Accounting
Lesson Developer: Rakhi Arora and Vaishali Kapoor
College/Department: Rajdhani College, University of Delhi
1
Table of Contents:
1. Learning outcomes
2. Introduction
3. What is macroeconomics?
4. Measurement of GDP
a. Various concepts of National Income
b. The Circular Flow
c. Expenditure approach to calculate GDP
d. Income approach to calculate GDP
5. Real Vs Nominal GDP
6. Price Indexes
7. Summary
8. Exercises
9. Glossary
10. References
2
Learning outcomes:
After you have read this chapter, you should be able to:-
a) Understand the various issues in an economy
b) Define National Income
c) List the various concepts of National Income
d) Understand the flow of money in the economy
e) Compute National Income through Expenditure and Income approach
f) Differentiate between Real and Nominal GDP
g) Acquaint with the GDP deflator and Consumer Price Index
INTRODUCTION
Newspapers, these days, are full of headlines symptomatic of the worsened conditions of
the global economy; which suggest that policy makers and economists have been worried
about what form the ongoing global financial crisis will take, how all economies would be
affected and whether all economies would emerge as gainers and take the lead? One needs
to know, how economists predict these crises & their repercussions;how economists study
the symptoms of any disturbance in the economy and provide the cure.
Economists & researchers keep studying every economy with the help of various economic
variables & economic tools at hand & economic data that is widely released in various
newspapers journals & articles –mostly produced by government. These data /statistics are
used to study the economy & policy makers use them to monitor the ongoing development
processes in the economy &to formulate policies.
This chapter broadly covers the macroeconomic issues and the macroeconomic
variablesin Section one. It discusses in Section two the Gross Domestic Product, GDP, –
an indicator of the health of an economy and in Section three, it explains the meaning of
Consumer Price Index,CPI, which represents the overall prices. This chapter largely
focuses on the accounting of National Income.
3
WHAT IS MACROCONOMICS?
Macro Economics is the study of the structure and performance of national economies and
of the policies that government‟s use to try to affect economic performance.
The various macroeconomic issues are as follows:
(i) Long Run Economic Growth

This issue addresses why do some nations economies grow rapidly, providing their
citizens with fast improving living standards, while other nation‟s economies remain
stagnant. For Instance, in 1870 per capita income was smaller in Norway then in
Argentine, but today per capita income is three times as high in Norway as in
Argentina. India‟s GDP has been growing since 1950 as represented in figure1 and it
has been impressive since 1990s because of the policy reforms adopted by Indian
government.
National Income of India*( Rs. Cr)
8000000
7000000
6000000
5000000
4000000
3000000 National income of
2000000 India Rs. Cr
1000000
0
1950-51
2011-12
1960-61
1970-71
1980-81
1990-91
2000-01
2000-02
2005-06
2006-07
2007-08
2008-09
2009-10
2010-11
Figure 1: National income of India from 1950 to 2012

Source: Statistical Outline of India 2012-13, TATA services ltd.
*national income is calculated at factor cost with new base year of 2004-05
(i) Business Cycle
4
Instead of growing at an ever rate at all times, economies tend to experience

short-term ups & downs in their performance technically described as Business
cycle. Why do economies experience such business cycles –recession for
instance (fall in output) in 1990 and then exhibited recovery (rise in output), the
longest period of uninterrupted economic growth in US economic history but
economic performance in 2000 was much weaker. A mild recession in 2001 was
followed by weak recovery that lasted only with December 2007. The recession
that began at the end of 2007 was worsened by financial crisis in 2008, which
contributed to a sharp decline in output at the end of 2008 and in early 2009.
(ii) Unemployment
The important aspect of recessions is that they are usually accompanied by an

increase inunemployment, which is measured by the key indicator of the
economy‟s health- the unemployment rate. One needs to figure out the reason
for high unemployment rates; sometimes even during relative prosperity times?
(iii) Inflation
Many efforts have been devoted by the economists to identify the costs &
consequences of even the moderate inflation. The key questions that need to be
addressed are: who are Gainers & losers from inflation? What costs does inflation
impose on society and their severity? What are the causes of inflation? What are the
best ways to curb it? Figure 2 shows the behavior of consumer goods prices over
time in India. They have been rising since 1950 and have doubled in the last decade
owing to droughts, US crisis and recession.
5
Consumer Price Index(2001 =100) CPI

250
200
150
100 Consumer Price

Index(2001 =100) CPI
50
0
1970-71
2011-12
1950-51
1960-61
1980-81
1990-91
2000-01
2000-02
2005-06
2006-07
2007-08
2008-09
2009-10
2010-11
Figure 2: Consumer Price Index of India from 1950 to 2012

Source: Statistical Outline of India 2012-13, TATA services ltd.
(iv) International Economy
This issue focuses on the economic links among nations – international trade &
borrowings etc. that affect the performance of individual economies & world
economy as a whole. The recent crisis affected the entire globe even when it
originated in US as Sub Prime crisis. This shows that countries are linked by
international trade and more so by financial flows.
(v) Macroeconomic Policy
How should economic policy be conducted so as to keep the economy‟s output,

inflation, unemployment rate & other variables as stable as possible to avoid the
major fluctuations which could risk the smooth functioning of the economy? The
bailout package given to banks in US to lessen the impact or spread of crisis is one
such example. The ongoing inflation in India is a matter of concern for RBI and they
have been framing policies to keep the inflation low.
6
MEASUREMENT OF GDP
The national income accounts are an accounting framework used in measuring current
economic activity. National Income Accounts are set up in a way that mirrors the structure
of the economy. Working through these accounts is a first important step towards
understanding how the macro economy works.
The economic activity that occurs during a period of time can be measured in the following
3 ways:
1. The total output produced in the economy.

2. Income received by the producer of the output.
3. Total expenditure incurred by final consumers.
All three approaches portray the identical picture of the economy. The money value
computed from either of the above ways is technically known as National income of the
economy.
VARIANTS OF GDP
Several variants of measuring economic activity are as follows:-
1. Gross Domestic Product (GDP)
GDP is the market value of all final goods and services produced by normal residents
as well as non-residents in the domestic territory of a country in a year. It includes
the market value of only final goods and ignores intermediate goods to avoid the
problem of double counting (i.e.) to count all goods and services produced in any
given year only once.
GDP= C+I+G+NX
Where,
7
C= Value of final consumer goods and services produced in a year and consumed by
households.
I= Purchase of capital goods by Producing sector (Addition to physical stock of

capital or stock)
G= Net expenditure made by Government (Government purchase of goods and

services)
X-M= Net Exports i.e. the difference between foreign spending on domestic goods
and domestic spending on foreign goods (i.e.)
NX=Exports-Imports
2. Gross National Product (GNP)
It is defined as the total market value of all final goods and services produced in a
year by normal residents of a country. These residents may be national or non-
national companies having their set up plants in India.
It is calculated by adding net factor income from abroad in GDP.
GNP=GDP+NFIA
Where,
NFIA is the difference between factor income received from abroad by normal
residents of India for rendering factor services in other countries and the factor
incomes paid to the foreign residents for factor services rendered by them in the
domestic territory of India.
3. Net Domestic Product (NDP)
The capital goods wear out or fall in value as a result of its consumption or use in
the production process. This consumption of fixed capital or fall in the value of fixed
capital due to wear and tear is called depreciation. So this depreciation is to be
deducted from GDP to get NDP. Therefore,NDP is the net market value i.e. after
providing for depreciation, all final goods and services produced by normal residents
as well as non-residents in the domestic territory of a country in a year. Therefore,
8
NDP= GDP-Depreciation
Exhibit 1: Relation between various measures of economic activity
4. Net National Product (NNP)
It refers to the market value of goods and services produced by normal residents of
a country in a year after providing for depreciation.
It is also known as National income at market price.
NNP= GNP-Depreciation
Or
NNP= GDP-Depreciation+ NFIA
5. Net National Product at Factor Cost (NNP fc)
Conceptually NNP at MP and NNP at FC are supposed to be identical - as the value of

final goods and services at market price is nothing but the sum total of factor cost
involved in their production as production process is the combined efforts of various
factors of production namely land, labour, capital and enterprise. NNPfc is also called
National Income, as it is the sum of all incomes earned by factors of production for
their contribution of land, labour, capital and entrepreneurial ability in the year‟s net
9
production. It is calculated by deducting indirect taxes and adding subsidies to the

national income at market price as indirect taxes lead to increase the market price as
compared to factor cost and subsidies lead to decrease the market price as
compared to the factor cost.
NNP fc= NNPmp- Net Indirect Taxes
Net Indirect Taxes= Indirect taxes-Subsidies
THE CIRCULAR FLOW DIAGRAM
A useful way to study the economic interactions among the four sectors in the economy is
through a circular flow diagram, which shows the income received and payments made by
each sector. The phenomenon of three methods of measurement of national income giving
identical results can be shown diagrammatically through the circular flow of money in the
economy.
10
Exhibit 2: Circular Flow Diagram
Let‟s analyze the circular flow step by step. Households provide their services to the firms
and government and in return they get wages. The circular flow diagram above shows the
flow of wages in the household sector as a compensation for their services. Interest on
corporate and government bonds and dividends from firms is another receipt of the
households. Social security benefits, veterans‟ benefits, and welfare payments are also
received by some of the households from the government. These kinds of payments from
the government for which the recipient does not supply any good/service/labor are called as
Transfer Payments. All these receipts constitute the total income received by the
households.
Households pay out by purchasing goods/services from the firms and by giving taxes to the
government. These components constitute the total payments by the households. The
11
gap between the total receipts and total payments of the households is whatthey save/dis-
save. Savings are categorized as a „leakage‟ from the circular flow as they withdraw the
current income/purchasing power from the system.
Goods/services are sold to the households and the government by the firms. Revenues are
generated by these sales which are shown as a flow into the firm sector in the diagram
above. Wages, interest and dividends are paid by the firms to the households and taxes are
paid by the firms to the government. These expenses are shown as flows out of the firm
sector.
Taxes are collected by the government from the households and the firms. Government
makes payments also by purchasing goods /services from the firms, paying wages and
interest to the households, and by making transfer payments to the households. Households
expend part of their income on imports and rest on domestically produced goods/services.
THE EXPENDITURE APPROACH TO MEASURING GDP
GDP can be obtained by adding up the four major categories of expenditures of national
income accounts- Consumption, Investment, Government purchases of goods/services, net
exports of goods/services. According to the expenditure approach, GDP is measured as the
total spending on the final goods/services produced in the nation during a specified period
of time of the national income.
Symbolically,
GDP can also be obtained by
Y= GDP=total production (or output)
= total income
=total expenditure;
C= consumption;
I= investment;
G= Government purchases of goods and services;
NX = net exports of goods and services.
12
With these symbols, we express the expenditure approach to measuring GDP as
Y=C+1+G+NX.
1. Consumption.
The expenditure on the final goods/services by the domestic households including on

those produced abroad is called as Consumption. It happens to be the principal
component of expenditure and it accounted for 56% of the Indian GDP in 2010. The
expenditures on consumption can be broadly categorized as follows:
a) Consumer durables, which are long-lived consumer items such as cars,

televisions, furniture, and major appliances (but not houses, which are
classified under investment)
b) Non-Durable goods, which are shorter lived items such as food, clothing,
and fuel; and
c) Services, such as education, health care, financial services, and
transportation etc.
2. Investment
Investment includes both spending for new capital goods, called fixed investment,
and increases in firms, inventory holding, called inventory investment. Investment,
in India, accounted for 37% of GDP in 2010. Fixed investment in turn has two major
components.
a) The expenditure by the businesses on equipments such as vehicles, computers,

machines and furniture, structuressuch as office building, warehouses and factories
and software by the businesses is called as Business fixed investment.
b) The expenditure on the construction of new houses and apartment building is usually
called as the Residential investment. These apartment buildings and houses are
considered as the capital goods as they provide shelter (service) over a long period
of time.
3. Government purchases of goods and services
13
The third major component of expenditure is the government‟s spending on currently

produced goods/services- Foreign as well as domestic. It accounted for 11% of
Indian GDP in 2010. Government also makes payments in the form of transfers,
which are not made in exchange for current goods/services. Transfers can take the
forms of social security and Medicare benefits, unemployment insurance welfare
payments, and so on. Transfers are not counted in GDP according to the expenditure
method as they are excluded from the government purchases category. In the
similar way, interest payments on national debt are also expelled from the
government purchases category.
Much like the distinction between private-sector consumption and investment some
part of government purchases accounts for current needs (such as employee
salaries) as some is devoted to acquiring capital goods (such as office buildings).
Exhibit 3: Components of Total Spending
14
4. Net Exports
Net exports are exports minus imports. Exports are the goods and services produced
within a country that are purchased by foreigners. . It is about 22% of GDP in 2010
in India.
Imports are the goods and service produced abroad that are purchased by a
country‟s residents, which was about 26% of GDP in 2010 in India. Net exports are
positive if exports are greater than imports and negative if imports exceed exports.
Exports are added to total spending because they represent spending (by foreigners)
on final goods and services produced in a country. Imports are subtracted from total
spending because consumption, investment, and government purchases are defined
to include imported goods and service. Subtracting imports ensures that total
spending C+I+G+NX, reflects spending only on output produced in the country.
THE INCOME APPROACH TO MEASURING GDP
According to the Income Approach, National Income is the summation of eight types of
income. It totals the income received by the producers inclusive of the profits and the taxes
payment to the government. The eight components are as follows:
1. Compensation of employees-The income of the workers excluding the self

employed and including the wages, salaries, employee benefits (inclusive of the
contributions by employers to pension plans) and employer contributions to social
security constitute the Compensation of Employees. Compensation of employees
happens to be the principal component of national income, which accounted for
55.7% of GDP in 2008.
2. Proprietors’ income- Proprietors‟ income is the income of the unincorporated self-

employed. Because many self-employed people own some capital (for example, a
farmer‟s tractor or a dentist‟s X-ray machine), proprietor‟s income includes both
labor income and capital income. Proprietor‟s income was 7.7% of GDP in 2008.
15
3. Rental income of persons-It is basically the income earned by the individuals

owning land/ structures that they rent to others. It is an insignificant component of
income and broadly covers various miscellaneous incomes such as royalty income
paid to authors, recording artists and others. Rental income of persons was about
1.5% of GDP in 2008.
4. Corporate profits-The earnings of the corporations are called as the corporate

profits. It represents the residue of the corporate revenue after the payments of
wages, interests, rents and other costs. These corporate profits are used to pay
corporate income tax and dividends to the shareholders. The remaining corporate
profits after these two payments are called as the retained earnings and are kept
by the corporations.
5. Net interest-Net interest is interest earned by individuals from businesses and

foreign sources minus interest paid by individuals. Net interest has varied from 4%
to 8% of GDP each year over the past 25 years.
6. Taxes on production and imports-It includes indirect business taxes such as sales
tax and excise taxes that are paid by businesses central to state, and local
governments, as well as customs duties and taxes on residential real estate and
motor vehicle licenses paid by households. These taxes have averaged about 7% of
GDP for the past 25 years.
7. Business current transfer payments (net)- Business current transfer payments

are payments made by businesses to individuals or governments or foreigners, but
not for wages or taxes or as payment for services. Instead such transactions as
charitable donations insurance payments are covered by this category of income.
Business current transfer payments have been between 0.5% and 0.9% of GDP each
year for the past 25 years.
8. Current surplus of government enterprises- Current surplus of government

enterprises is essentially the profit of businesses that are owned by governments.
16
In addition to the eight components of national income just described, three other items
need to be accounted for to obtain GDP:
 Statistical discrepancy;
 Depreciation; and
 Net factor payments.
REAL VS NOMINAL GDP
GDP, measured in rupee terms, is sum of value of the output produced in the economy, i.e.
sum of product of prices of different commodities produced and their respective quantities.
Nominal GDP = ∑piqi
Where,
pi= price of the ithcommodity
qi= quantity of the ithcommodity
The value of goods/services measured at current prices is usually called as nominal GDP by
the economists. Nominal GDP is not capable of reflecting accurately as to how well an
economy is able to satisfy the demands of households, firms and the government. If only all
the prices double and the quantities remain unchanged, then accordingly GDP would double.
But it would be misleading to state that the ability of the economy to satisfy the demands
has doubled as the quantities of every good-produced remains unchanged.
A better and more reliable measure to monitor the economy‟s well-being would be one that
would not be influenced by the changes in prices. Henceforth, real GDP is used by the
economists. Real GDP is the measurement of the value of goods/services using a constant
set of prices, i.e., it would tell us the affect on expenditure on output when only quantities
change and prices don‟t.
A real variable is an economic variable that is measured by the base year prices. The
physical quantity of the economic activity is measured by the real economic variables. Real
GDP measures the physical volume of an economy‟s final production, using the base year
prices. Nominal GDP measures the value of an economy‟s final output, using the current
market prices.
17
PRICE INDEXES
A measure of the average level of prices for some specified set of goods/services relative to
the prices in a specified base year is called as the Price Index. For instance, a GDP deflator
is a price index, which measures the overall level of goods/services included in GDP. It is
defined as follows:
Real GDP= nominal GDP/ (GDP deflator/100)
The GDP deflator (divided by 100) is the amount by which nominal GDP must be divided,
or” deflated” to obtain real GDP. In our example, we have already computed nominal GDP
and real GDP, so we can now calculate the GDP deflator by rewriting the preceding formula
as:
GDP deflator = 100X nominal GDP/real GDP.
The Consumer price Index
GDP deflator deals with the average level of prices of goods/services that are included in
GDP. The CPI, Consumer Price Index, is available monthly. The Bureau of labor Statistics
constructs the CPI by sending people out each month to find the current prices of a fixed
list, or “basket” of consumer goods and services, including many specific items of food,
clothing, housing, and fuel. The CPI for the month is then computed as:
100*(Current cost of a basket of consumer items)/ (cost of the same basket of items in
reference base period).
SUMMARY
 Macroeconomics deals with understanding and providing solutions to inflation, high

unemployment, long run economic growth, business cycles and dynamics of
international economy.
18
 National income of the economy can be computed by either adding the expenditure
incurred by the residents in a year or by adding everybody‟s income. Either of the
two sums will yield same result as income of one person is expenditure of the other
and vice a versa.
 Computations could be made easy by remembering following equations:
1. Net + CFC = Gross
2. EC+ NIT = MKT Price
3. Domestic + NFIA = National
 In expenditure method, national income is computed by adding consumption (C),
government expenditure (G), investment (I) and net exports (NX) in a given
accounting year.
 While calculating national income by income method, following components are
added:Compensation of employees, Proprietors‟ income, Rental income of persons,
corporate profits, Net interest, Taxes on production and imports, Current surplus of
government enterprises, and Business current transfer payments (net).
 Real economic variables deal with the physical quantity of the economic activity
using base year prices. For example, real GDP is at constant prices i.e. it measures
physical production of this year at base year prices. In contrast, nominal GDP is
current rupee- GDP i.e. rupee value of an economy‟s final output, measured at
current market prices.
 A price indexmeasures theaverage level of prices of a basket of goods/services
relative to the prices of the same basket in a specified base year.
EXERCISES
SHORT ANSWER QUESTIONS
Q1. Which of the following items will be included while calculating GDP of India? Why and
why not?
a. Mohan purchased a computer worth Rs.30, 000.

b. Shyam bought a second hand scooter worth Rs. 25,000.
c. Sunita bought furniture manufactured in Hongkong worth Rs. 1, 00,000.
19
Q2. Fill in the blanks:
a. NNPfc+ …………....= NNPmp

b. NNPmp - ……...........= NDPmp
c. GNPfc + NIT – CFC = ………
LONG ANSWER QUESTIONS
Q1. What are major macroeconomic issues that each economy has to deal with? Explain it
with reference to Indian scenario.
Q2. What are the approaches to measuring economic activity? Why do they give same
answer?
Q3. List all the components of total spending. Why imports are subtracted when GDP is
computed in the expenditure method?
Q4. “For assessing growth performance of an economy real GDP is a better measure”.
Comment.
NUMERICALS
Q1. Calculate national income from expenditure method and gross domestic product at
factor cost by income method:
a. Government final consumption expenditure 100
b. Gross fixed capital formation 310
c. Operation surplus 800
d. Change in stock 50
e. Exports 40
f. Net factor income from abroad -10
20
g. Subsidies 20
h. Consumption of fixed capital 20
i. Imports 50
j. Compensation of employee‟s 300
k. Mixed income of self employed 30
l. Indirect taxes 120
m. Private final consumption expenditure 800
Q2. Calculate Gross National Product at market prices from the following data?
a. Consumption of fixed capital 10
b. Value of output in primary sector 100
c. Value of output in secondary sector 150
d. Gross value added at market prices in the tertiary sector 150
e. Net exports 10
f. Net indirect taxes 10
g. Value of intermediate consumption in
(i) Primary sector 40

(ii) Secondary sector 50
(iii) Tertiary sector 60
h. Net factor income from abroad -5
Q3. Calculate national income and GDPmp from given data?
a. Mixed income 64,448
b. Gross profit 12,000
21
c. Consumption of fixed capital 8,868
d. Compensation of employees 53,452
e. Indirect taxes 15,456
f. Interest 4,000
g. Rent 4,176
h. Net factor income from abroad -1,136
i. Subsidies 1,348
Q4. From the following data, calculate national income and gross domestic product:
a. Compensation of employees 680

b. Depreciation 136
c. Employer‟s contribution to social security 120
d. Profit 100
e. Interest 80
f. Rent 40
g. Royalty 20
h. Net indirect taxes 152
i. Net factor income from abroad -12
Q5. Consider a three good economy and for this economy then, calculate nominal GDP for
year 1 and year 2 and real GDP for year 2 from the following given information:
PRODUCTION PRICE PER UNIT

YEAR1 (Q1) YEAR 2 YEAR 1 YEAR 2(P2)
(Q2) (Rs) (Rs)
GOOD A 6 11 5 4
GOOD B 7 4 3 10
GOOD C 10 12 7 9
22
Also calculate, GDP deflator.
GLOSSARY
 Macroeconomics: Macro Economics is the study of the structure and performance of

national economies and of the policies that government‟s use to try to affect
economic performance.
 National Income: National Income is the sum of all incomes earned by factors of
production for their contribution of land, labour, capital and entrepreneurial ability in
the year‟s net production.
 Consumption of Fixed capital: The capital goods wear out or fall in value as a result
of its consumption or use in the production process is known as Consumption of
Fixed Capital.
 Net Factor Income from Abroad:NFIAis the difference between factor income
received from abroad by normal residents of India for rendering factor services in
other countries and the factor incomes paid to the foreign residents for factor
services rendered by them in the domestic territory of India.
 Real GDP:Real GDP is the physical quantity produced in an economy in a given
accounting year measured as (production of current year) x (prices of a base year).
 Nominal GDP:Nominal GDP is the value of production in an economy in a given
accounting year measured as (production of current year) x (prices of current year).
 GDP Deflator:The GDP deflator (divided by 100) is the amount by which nominal GDP
must be divided, or” deflated” to obtain real GDP.
 CPI:CPI measuresthe current cost of the basket of consumer items divided by the
cost of the same basket of items in the reference base period.
Refernces
1. N. Gregory Mankiw, Economics: Principles and Applications, India edition by South

Western, a part of Cengage Learning, Cengage Learning India Private Limited, 4th edition,
2007.
2. Andrew B. Abel and Ben S. Bernanke, Macroeconomics, Pearson Education, Inc.7th

edition, 2011.
3. http://data.worldbank.org/indicator/NE.CON.PETC.ZS
23
4. Statistical Outline of India 2012-13, TATA services ltd.
24
Money: Demand and Supply
Semester-I
Paper I: Principles of Economics(POE)
Unit-IV
Lesson: Money: Demand and Supply
Lesson Developer: Rakhi Arora and Vaishali Kappor
College/Department: Rajdhani College, University of Delhi
1
Table of Contents:
2. Introduction
3. What is money
a. Origins of money
b. Functions of money
c. Quantity theory of money
4. Money demand
a. Motives for money demand
b. Money demand function
c. Other factors affecting demand for money
5. Money supply
6. Summary
7. Exercises
8. Glossary
9. References
2
Learning outcomes:
a) Define money
b) Explain different types of money ever since its evolution
c) Notify the various functions of money in economy
d) Define money demand.
e) List various factors of demand for money
f) Define money supply
g) State various measures of money supply
INTRODUCTION
All currency notes in circulation across the world are promissory notes. For instance, a
hundred rupee note states that, “ I promise to pay the bearer the sum of 100 rupees” along
with the signature of RBI governor, which makes it a legal tender and therefore is widely
accepted for making transactions.
To facilitate the transactions in India, RBI prints currency notes of different denominations.
But how does RBI know how many currency notes would be sufficient to make all the
transactions in the economy? RBI anticipates the requirements for printing notes on the
basis of the money demanded by the people.
This chapter broadly covers the evolution of money, its functions and quantity theory of
money in section one. It discusses in section two, the reasons for demanding money and
factors determining money demand and in section three, it explains the different
measuresof money supply.
3
WHAT IS MONEY
A generally accepted medium of exchange is usually defined as Money. A medium of

exchange is anything that is widely accepted in a society in exchange for goods and
services. Money can serve other roles also apart from being a medium of exchange, which is
its defining function.
THE ORIGINS OF MONEY
The origins of money go far back in antiquity. Many primitive tribes seem to have made
some uses of it.
Money in POW camp
Unusual form of money developed in Nazi prisoner of war (POW) camps during
World War II. These prisoners were supplied with various goods like food,
clothing, cigarettes etc, but no attention to personal preferences was provided.
Then the endowments with the prisoners invoked them to trade with to trade with
each other to have a better bundle of goods.
Barter proved to be an inconvenient way to allocate goods because it requires

double coincidence of wants. Eventually, cigarettes became established currency
with which trades were made. Even non smoker happily accepted for they know
they could exchange it with for some other good is future. Within POW camp the
cigarette became the store of value, unit of account & the medium of exchange.
1. Metallic money
Different commodities have been used as money at some or other time but gold and silver
proved to have great advantages over stones or other metals. The metals were carried in
bulk before coins would have been invented. When a purchase was made, the requisite
quantity of the metal was carefully weighed on a scale. The invention of coinage eliminated
the need to weigh the metal at each transaction, but it created an important role for an
authority, usually a monarch, who made the coins by mixing gold or silver with base metals
to create convenient size and durability, and the authority affixed the seal that acted as
guarantee for the amount of precious metal contained in coin. It was convenient as long as
4
everyone knew that the coin would be accepted at „face value‟.The face value was nothing
more than a statement that a certain weight goes gold or silver was contained therein.
However, coins often could not be taken at their face value. A form of counterfeiting i.e.
clipping a thin slice off the edge of the coin and keeping the valuable metal be came
common. This of course served to undermine the acceptability of coins even if they were
stamped. To get around this problem, the idea arose of minting the coins with a rough
edge; the absence of the rough edge would immediately indicate that the coin had been
clipped.
Some rulers were quick to seize the chance of getting a position to work a really profitable
fraud by ordering their subjects to bring their coins into the mint to be melted down and
coined afresh with a new stamp. Between the melting down and the recoining, however, the
rulers had only to toss some further inexpensive base metal in with the molten coins. This
debasing of the coinage allowed the ruler to earn a handsome profit by minting more new
coins than the number of old ones collected, and putting the extras in the royal vault.
Consider a fifty-fifty ratio of gold and cheap metal to be alloyed with it. Iffor instance,
subjects bought 50 coins to be minted; ruler will mix cheap metal in it and would have 100
minted coins: 50 to be returned and ruler will retain 50 such coins.
The result of this debasement was inflation. The subjects had the same number of coins as
before (50), and hence could demand the same quantity of goods. When rulers paid
Gresham’s law
The early experience of currently debasement led to the observation known as

Gresham’s law, which states that ‘bad money drives out good’.
In the mid of 16th century, when Queen Elizabeth I came to the throne of
England, the coinage was severely debased. To help trade, Elizabeth minted
new coins that contained full face value in gold. As soon as these new wins
were into circulation, they disappeared. Why?
Consider you possess one new & one old coin, each with same face value. If 5
you had to pay a bill you would use debased coin as you part with less gold this
way and if you wish to obtain certain amount of gold bullion by melting gold
their bills, however, the recipients of the extra coins could be expected to spend them. This
causes a net increase in demand, which in turn bid up prices.
It was the experience of such inflation that led early economists to stress the link between
the quantities of money and the price level. This relationship is popularly known as the
Quantity Theory of Money (QTM)itwill be discussed later in this chapter.
To this day the revenue generated from the power to create currency is known as seignior
age. Seignior age was not normally revenue generated by debasement; originally it was an
explicit duty, or tax levied on the mint. In the modern context the possibility of debasement
does not enter, so the Seignior ageis applied to the revenue that accrues to government
from the power to print banknotes (since bank notes have very low production costs relative
to their face value) and from another source that is commercial banks are forced to place
non interest bearing deposits at the central banks.
2. Paper money
The next milestone in evolution of money was when paper currency evolved. The source of
evolution of paper currency was goldsmiths. Initially, public began to deposit their gold with
goldsmiths since they had secure safes. Goldsmiths used to give their depositors a
promiseto hand over gold whenever demanded. Whenever depositor was required to make
any large purchase, depositor would go to goldsmith, reclaim some of the gold deposited
and hand it over to the seller of the goods. If the seller had no immediate need for the gold,
he would carry it back to the goldsmith for safekeeping on his behalf.
6
But this seems illogical as initial depositor reclaimed gold and handed over to the seller, who
again deposited gold with the goldsmith. So why involve into risky business of physically
transferring the gold? As long as goldsmith is considered reliable and people had confidence
in goldsmith, the buyer only needed to transfer goldsmith‟s receipt to the seller. Then again
if this seller wishes to transact with the third party and he also finds goldsmith to be
reliable, this transaction could also be effected by passing goldsmith‟s receipt. This receipt
in this case was as good as transfer of gold itself.
When it came into being in this way, paper money represented a promise to pay so much
gold on demand. In this case the promise was made first by goldsmiths and later by banks.
Such paper money, which became bank notes, was backed by precious metal and was
convertible on demand into this metal.
Fractionally backed paper money
Early on many gold smiths and banks discovered that it was not necessary to keep a full
ounce of gold in the vaults for every claim to ounce circulating as paper money. At any one
time some of the bank‟s customers would be withdrawing gold, other would be deposing it
and most would be trading in bank‟s paper notes without indicating any need or desire to
convert them into gold.
As a result the bank was able to issue more money (initially notes, but later deposits)
redeemable in gold it the amount of gold that it held in its vaults. This was good business,
because the money could be invested profit in interest-earning loans (often called advances)
to in visuals and firms. The demand for loans arose, as it does today, because some
customers wanted credit to help the over hard times or to buy equipment for their business.
To this day banks have many more claims outstanding against them than they actually have
in reserves available to pay those claims. We say that the currency issued in a situation is
fractionally backed by the reserves.
The major problem with a fractionally backed convert the currency was maintaining its
convertibility into the precious metal by which it was backed. It would be imprudent to issue
too much paper money, which is unable to redeem its currency in gold when the demand for
gold isslightly higher than proportionate. It would then have to suspend payments, and all
holders of its notes would suddenly find that the notes were worthless. However the prudent
bank that kept a reasonable relationship between its note issue and its gold reserve would
find that could meet a normal range of demand for gold without any trouble.
7
3. Fiat money
As time went on, note issue by private banks became less common and central banks,
which are (usually) state-owned institution, took control of the currency. Over time central
banks have assumed a monopoly in the provision of money (cash) to the economy. As a
result, they have the responsibility of controlling monetary conditions in the economy and
ultimately they determine the value of a nation‟s (or Group of nation‟s) currency.
Originally the central banks issued paper currency that was fully convertible into gold. In
those day‟s gold would be brought to the central bank, which would issue currency in the
form of gold certificate‟s the asserted that the gold as available on demand. The gold supply
thus set some upper limit on the amount of currency. However, central banks like private
banks before them, could issue more currency than they had in gold, because in normal
times only a small fraction of the currency was presented for payment at any one time.
Thus even though the need to maintain convertibility under a gold standard put an upper
limit on note issue, central banks had substantial discretionary control over the quantity of
currency outstanding.
Fiat money in primitive societies?
In primitive societies, stone money of Yap (ontiny Micronesiar island of Yap) and
seashells (in America & New Guinea) played the medium of exchange function of
money. Prominent Economists like kenyes, Friedman and Mankiwgive these as
an example of fiat money. The reason for the same is that stone money of Yap
and seashells are considered not useful and not convertible and those do not have
any other legal status. But DrorGoldberger puts a question mark on whether these
ever existed as fiat money in their respective societies. Dror Goldberger explains
that stone money of Yap & sea shells would not be considered as example of fiat
money as these were intrinsically valuable to their primitive users and considering
them as fiat money would be equivalent to ignoring the fact that money then
circulated had esthetic value and it had a religions use too.
8
Almost all the countries abandoned the gold standard during the period from 1914-1928.
Then that the currencies were not convertible into gold, money derived its value from its
acceptability in exchange. Fiat money is widely acceptable because it is declared by
Government order or fiat to be legal tender. Legal tender is anything that by law must be
accepted than offered either for the purchase of goods or services or to discharge a debt.
Today almost all currency is fiat money.Fiat money is valuable because it is accepted by
convention and in law in payment for the purchase of goods or service and for the discharge
of debts.
Many people are disturbed to learn that present-day paper money is neither backed by nor
convertible into anything more valuable-that it consists of nothing but pieces of paper
whose value derives from common acceptance. Many people believe that their money
should be more substantial than this. Yet money is, in fact, nothing more than pieces of
paper.
If fiat money is acceptable, it is a medium of exchange. Further, if its purchasing power

remains stable, it is a satisfactory store of value. And if both of these things are true, it will
also serve as a satisfactory unit of account.
FUNCTIONS OF MONEY
Money acts as a medium of exchange and can also serve as a store of value and a unit of
account.
1. Medium of Exchange
Goods would have to be exchanged by barter (one good being swapped directly for
another) in the absence of money. The major difficulty with barter is that each transaction
requires a double coincidence of wants; i.e. a great deal of time is required to search an
eligible person for a viable transaction.Thus a thirsty economics lecturer would have to find
a brewer who wanted to learn economics before he could swap a lesson in economics for a
pint of beer.
The severity of this problem could be reduced by using money as a medium of exchange.
Output could be sold for money and could be used subsequently to purchase the commodity
of requirement from others. So a monetary economy typically involvers exchanges of goods
and services for money and of money for goods, but not of goods for goods.
9
The double coincidence of wants, which is required for barter, is unnecessary, when a
medium of exchange is used.
By facilitating transactions, money makes possible the benefits of specialization and the
division of labor, which in turn contributes to the efficiency of the economic system.
Money must possess several characteristics to be able to serve as a medium of exchange. It

must have a known value i.e. readily acceptable. It‟s value must be high in relation to it‟s
weight, it must be divisible, as for making transactions having only a small value it would be
useless if money comes only in large denominations. It must be difficult, if not impossible to
counterfeit.
2. Unit of Account
As a unit of account, money is the basic unit for measuring economic value. In India, for
example virtually all prices, wages, asset values, and debts are expressed in rupees. Having
a single uniform measure of value is convenient. For example, pricing all goods in India in
rupees -instead of some goods being priced in yen, some in gold and some in Microsoft
shares-simplify comparison among different goods.
3. Store of Value
To store the purchasing power, money is the most convenient way. The money taken in
exchange for the goods sold today may be stored until it is required.However, money must
have a relatively stable value to be a satisfactory store of value. A rise in the price level
leads to decrease in the purchasing power of money because more money is required to buy
a typical basket of goods. When the price level is stable, the purchasing power of a given
sum of money is also stable, when the price level is highly variable, this is not so, and the
usefulness of money as a store of value is undermined.
Transactions and the Quantity Equation
Money is usually held to buy goods and services. People hold more money when they need
more money for making transactions. Thus, the number of rupees exchanged in the
transactions is related to the quantity of money in the economy.
The link between transactions and money is expressed in the Quantity equation in the
following manner:
10
Money x Velocity = Price x Transactions
MxV = PxT …………………………………(1)
where,
T = the number of times goods/ services are exchanged for money in a year.
P = the price of a transaction and the number of transactions.
M = the quantity of money.
V = the velocity of money (i.e) the number of times money changes hands in a given year
Therefore, the right side of the equation tells about the transactions and the left side of the
equation tells about the money used to make transactions.
For example, suppose that 60 slices of cheese are sold in a given year at Rs.5 per slice.
and the quantity of money in the economy ( M * V) = Rs. 20
Then,
T = 60 slices per year,
P = Rs.5 per slice, and
Then,
The total number of rupees exchanged is:
P*T = Rs. 5/slice x 60 slices/year = Rs. 300/year
Therefore, by re-arranging the quantity equation, we can compute velocity as:
V=(P*T)/M
=(Rs.300/year)/(Rs.20)
V =15 times per year.
(i.e.) for Rs.300 of transactions per year to take place with Rs. 20 of money, each rupee
must change hands 15 times per year.
11
The quantity equation basically shows that a change in one variable must lead to a change
in one or more of other variables so as to maintain the equality i.e. the quantity equation is
an identity.
From transactions to Income
As the number of transactions is difficult to measure in an economy, economists usually

replace T by the total output of the economy, Y. As the more the economy produces, the
more goods are bought and sold, therefore, we can say that transactions and output are
related. But they are not the same. However, a transaction is made when a used car is sold
by one person to another, which is proportional to the rupee value of the output.
If
Y = the amount of output
P = the price of unit of output,
Value of output = PY
Y = real GDP deflator,
PY = nominal GDP.
Therefore,
The quantity equation becomes:
Money x Velocity = Price x Output
M x V = P xY ………………………………..(2)
As Y is also total income, therefore, V in this version of the quantity equation is called the
Income velocity of money.
The income velocity of money tells us the number of times a rupee billenters someone‟s
income in a given period of time. We most commonly use this version of the equation.
The Money Demand Function and the Quantity Equation
When we express the quantity of money in terms of the quantity of goods and services, it
helps us analyze the affect of money on the economy. The amount M/P is called as the real
money balances, which measures the purchasing power of the stock of money.
12
For example, an economy produces only cheese. If M=Rs.20, P=Rs.5 per slice, then M/P=4
slices of cheese. That is, at current prices, money stock in the economy is able to buy 4
slices.
A money demand function shows what determines the quantity of real money balances
people wish to hold. A simple money demand function is:
(M/P)d = kY
Where,
k = A constant that tells us how much money people want to hold for every rupee of
income.
This equation states that the quantity of real money balances demanded is proportional to
real income.
The money demand function and the demand function of a particular good are alike. The
convenience of holding real money balances is our good under consideration. Owning an
automobile makes it easier for a person to travel. Similarly, holding money makes it easier
for a person to make transactions. Hence, it can be said that as higher income leads to a
greater demand for automobiles, similarly, higher income also leads to a greater demand
for the real money balances.
On adding the following condition to the money demand function:
Demand for real money balances (M/P) d = the supply (M/P)
We get,
M/P =kY
Rearranging the terms, we get
M (1/k)=PY
And we get the Quantity equation as:
MV=PY,
Where,
V = 1/k.
It shows the link between the demand for money and the velocity of money.
13
Cambridge Equation & Fisher Equation of Quantity Theory of Money:
Similarity & Difference
Fisher’s Quantity of Theory of Money establishes relation between money and

transactions by the equation:
MV=PT………………………………………..(i)
But, Cambridge economists linked money to income via quantity theory of money.
Md= kPY…………………………………..(ii)
Money demand is a function of the nominal income ie PY. A fraction of this nominal
income is demanded by the public to be held as cash.
On comparing the two we find, that Y in equation (ii) is the physical quantity of
output ( real income) and so is equal to transactions is the equation (i) this yields that
V = 1/k or k= 1/V i.e. one is the reciprocal of the other.
When k is large i.e. people wish to hold a lot of money for each rupee of income then V is
For example, stock of money that people is wish to hold equals one – fourth of value
small i.e. money changes hands infrequently. On the contrary, when k is small i.e. people
of total income (transactions) thus k is 0.25 and V the reciprocal of k, is 4 . If money
wish to hold only little money then V is large i.e. money changes hands frequently.
supply iswe
Therefore, to can
be one – fourth
deuce of value
that money of transactions,
demand parameter keach
and rupee must
velocity be usedV on
of money are
negatively related
average four to each other.
times.
THE ASSUMPTION OF CONSTANT VELOCITY
On making the assumption of constant velocity of money, the quantity equation becomes
the quantity theory of money.
14
If the money demand function changes, the velocity does change in reality. For instance,
the average money holdings of the people were reduced when Automatic teller machines
were introduced. It means a good approximation is provided by the assumption of constant
velocity in various situations.
The assumption of constant velocity makes the quantity equation a theory of determination
of nominal GDP. The Quantity equation says:
𝑀𝑉 = 𝑃𝑌
Where
V barmeans that velocity is fixed.
Therefore, a change in the quantity of money (M) must cause a proportionate change in
nominal GDP (PY) i.e., if V is fixed, the quantity of money determines the rupee value of the
economy‟s output.
MONEY, PRICES AND INFLATION
The three building blocks that help us study the determination of the overall level of prices
are as follows:
1. The level of output, Y, is determined by the factors of production and the production
function.
2. The nominal value of output, PY, is determined by the money supply. This conclusion
is deduced from the Quantity equation and the fixed velocity of money.
3. The ratio of nominal value of output, PY, to the output level, Y, gives the price level.
What happens when the money supply is changed by the Central bank is explained by this
theory. Any change in the money supply causes proportionate change in nominal GDP as
the velocity is fixed. The change in nominal GDP gets represented in the change in the price
level as the factors of production and the production function have already determined the
real GDP. Therefore, the quantity theory of money states that the price level is proportional
to the money supply.
As the percentage change in the price level is nothing but the inflation rate, therefore, this
theory of price level is also a theory of the inflation rate. Consequently, the quantity
equation in percentage form is represented as follows:
% Change in M+ % Change in V= % change in P + % Change in Y……..(4)
15
% Change in Mis under the control of the central bank.
% Change in Vreflects shifts in money demand (the assumption of constant velocity

implies % change in V=0).
% Change in P is the rate of inflation
% Change in Ydepends on growth in the factors of production and on technological

progress, which we can take as given.
This analysis states that the rate of inflation is determined by the growth in the money
supply except for a constant that itself is determined by the exogenous growth in the
output.
The demand for money
The amount of wealth what everyone in the economy wishes to hold in the form of money
balances is called the demand for money. Because people are choosing how to divide their
given stock of wealth between moneyand bonds, it follows what if we know the demand for
money we also know the demand for bonds. With a given level of wealth, a rise in the
demand for money necessarily implies a fall in the demand for bonds; if people wish to hold
1 billion of bonds. It also follows that if households are in equilibrium with respect to their
money holdings, they are in equilibrium with respect to their bond holdings.
MOTIVES FOR HOLDING MONEY (Money Demand)
1.The transactions motive
Money is required for making most of the transactions. Consumers pass the money to the
firms to make the payment for the goods and services produced by them and firms pass the
money to the employees for the labor services supplied by them to the firms. Money
balances that are usually held for this reason are called as the Transactions balances.
An imaginary world in which the receipts and disbursements of consumers and firms were
perfectly synchronized, it would be unnecessary to hold transactions balances. If every
time a consumer spent l0 she received as part payment of her wages, no transactions
balances would be needed. In the real, world, however, receipts and payments are not
perfectly synchronized.
16
Consider the balances that are held because of wage. Suppose for purposes of illustration,
that firms wages every Friday and that employees spend all their on goods and services,
with the expenditure spread evenly over the week. Thus on Friday morning firms hold
balances equal to the weekly wage bill; on Friday the employees will hold these balances.
Over the week workers balances will be drawn down as of purchasing good and services.
Over the same the balances held by firms will build up as a result goods and services until,
on the following Friday firms will again have amassed balances equal to bill that must be
met on that day.
Transactions motive arises because payments and are not synchronized.
That determines the size of the transactions balances to hold? It is clear that in our example
total transactions very with the value of the wage bill. If the wage bill for any reason, the
transactions balances held and households for this purpose will also double, As it is with
wages, so it is with all other transactions the size of the balances held is positively related
to of the transactions.
The average value of money balances that people to hold over a particular period that is
relevant for economics, but we need to knows how money de and related to GDP rather
than to total transactions. In the value of all transactions exceeds the value of the final
output. When the miller buys wheat from the farmer and when the baker buys flour from
the miller, both are transactions against which money balances must be held, although only
the value added at each stage is part of GDP.
Generally there will be a stable, positive relationship between transactions and GDP. A rise
in GDP also leads to a rise in the total value of all transactions and hence to an associated
rise in the demand for transactions balances. This allows us to relate transactions balances
to GDP.
Transactions money demand = f (income)
2. Precautionary Motive
Sometimes, unpredictably your vehicle breaks down or you are required to make an
impromptu visit to sick relative. At times like these, certain expenditures crop up out of the
blue. As a precaution against cash crises, when receipts are abnormally low or
disbursements are abnormally high, firms and individuals carry money balances.
Precautionary balances usually grant a cushion against the ambiguity about the timing of
17
the cash flows. The greater is the quantum of such balances, the larger would be the shield
against running out of cash balances due to provisional fluctuations in cash flows.
The seriousness of the risk of a cash crisis depends on the penalties that are inflicted for
beingcaught without sufficient money balances. A firm is unlikely to be pushed into
insolvency, but it may incur considerable costs if it is forced to borrow money at high
interest rates in order to meet a temporary cash crisis.
The precautionary motive arises because individuals and firms are uncertain about the
degree to which payments and receipts will be synchronized.
The protection provided by a given quantity of precautionary balances depends on the

volume of payments and receipts. A $100 precautionary balance provides a large cushion
for a person whose volume of payments per month is $800 and a small cushion for a firm
whose monthly volume is $250,000. To be able to provide the same degree of protection as
the value of transactions rises, more money is necessary.
The precautionary motive, like the transactions motive, causes the demand for money to
vary positively with the money value of GDP.
For most purposes the transactions and precautionary motives can be merged, as they both
show that desired money holdings are positively related to GDP. Indeed, they both show
money being held in relation to transactions, either planned or potential.
Precautionary money demand = f (income)
3. Speculative Motive
People usually hold money because of its characteristics of an asset. Some money is usually
held by the individuals and the firms to be able to evade the inbuilt uncertainty in
variableprices of other financial assets. Money held for this reason is called as the
speculative balance. This motive was first analyzed by Keynes, and the classic modern
analysis was developed by Professor James Tobin, the 1981 Nobel Laureate in economics.
Any holder of money balances forgoes the extra interest income that could be earned if
bonds are held instead. However, market interest rates fluctuate, and so do the market
prices of existing bonds (their present values depend on the interest rate). Because their
prices fluctuate, bonds are a risky asset. Many individuals and firms do not like risk; they
are said to be risk –averse.
18
Wealth holders require balancing the extra interest income that could be earned by holding
bonds against the risk carried by bondsat the time of choosing between holding money and
bonds.At one extreme, if individuals hold all their wealth in the form of bonds, they earn
extra interest on their entire wealth, but they also expose their entire wealth to the risk of
changes in the price of bonds. At the other extreme if people hold all their wealth in the
form of money, they earn less interest income, but they do not face the risk of unexpected
changes in the price of bonds. Wealth holders usually do not take either extreme position.
They hold part of their wealth as money and part of it as bonds; (i.e.), they diversify their
holdings. The fact that some proportion of wealth is held in money and some in bonds
suggests that, as wealth rises, so will desire for money holdings.
Although one individual‟s wealth may rise or fall rapidly, the total wealth of a society
changes only slowly. For the analysis of short-term fluctuations in GDP, the effects of
changes in wealth are fairly small, and we will ignore them for the present. Specific
individuals may undergo large wealth changes in response to bond price changes, but with
inside wealth the total effect is negligible when leaders gain, borrowers lose; and when
lenders lose borrowers gain. Over the long term, however, variations in aggregate wealth
can have a major effect on the demand of money.
Wealth that is held in cash or deposits earns less interest than could be earned by holding
bonds; hence the reduction in risk involved in holding money carries an opportunity cost in
terms of forgone interest earnings. The speculative motive leads individuals and firms to
add to their money holding until the reduction in risk obtained by the last pound added is
just balanced ( in each wealth-holders view) by the cost in terms of the interest forgone on
that pound. A fall in the rate of return on bonds for the same level of risk will encourage
people to return on bonds for the same level of risk will encourage people to hold more of
their wealth as money and less in bonds. A rise in their rate of return for a given level of
risk will cause people to hold more bonds and less money.
The speculative motive implies that the demand for money will be negatively related to the
rate of interest.
Speculative Money demand = f (Income, Rate of interest)
19
THE MONEY DEMAND FUNCTION
We express the effects of the price level, real income, and interest rates on money demand
as
Md= P x L(Y,i) ……………………………(5)
where
Md = the aggregate demand for money, in nominal terms;
P=the price level;
Y= real income or output;
i= the nominal interest rate earned by alternative, nonmonetary assets;
L= a function relating money demand to real income and the nominal interest rate.
Figure 1: Quantity of money Figure 2: Quantity of Figure 3: Quantity of

and rate of interest are money and price level are money and GDP are
inversely related. positively related. positively related.
Equation (5)states that nominal money demand, Md is proportional to the price level, P.
Hence, if the price level P doubles (given the real income and rate of interest) then, the
nominal money demand Md will become double, reinforcing the fact that real money
required to conduct the same real transactions will be twice.
Equation (5) also indicates that, for any given P, Md depends (through the function L) on
real income,Y and the nominal interest rate on non-monetary assets,i.An increase in real
20
income, Y, raises the demand for liquidity and thus increases money demand. An increase in
the nominal interest rate, i, makes non-monetary assets more attractive which reduces
money demand.
We could have included the nominal interest rate on money im in the above equation
because an increase in the interest rate on money makes people more willing to hold money
and thus increases money demand. Historically, however, the nominal interest rate on
money has varied much less than the nominal interest rate on nonmonetary assets (for
example, currency and a portion of checking accounts always have paid zero interest ) and
thus has been ignored by many statistical studies of equation thus for simplicity we do not
explicitly include im in the equation.
Md=PXL(Y, r+ e
) ………...…………………….(6)
e
Equation (6) shows that for any expected rate of inflation , increase in the real interest
rate increases the nominal interest rate and reduces the demand for money. Similarly, for
any real interest rate, an increase in the expected rate of inflation increases the nominal
interest rate and reduces the demand for money.
Nominal money demand, Md measures the demand for money in terms of rupees. If we
divide both sides of Eq. (6) by the price level, P, we get,
Md /P=L(Y, r+ e
). ………………………………….(7)
The expression Md /P is called real money demand or the demand for real balances. Real
money demand, Md /P depends on real income (or output), Y, and on the nominal interest
e
rate, which is the sum of the real interest rate, r and expected inflation, .The
function,L,that relates real money demand to output and interest rates in Eq. (3) is
calledthe money demand function.
OTHER FACTORS AFFECTING MONEY DEMAND
The additional factors influencing money demand are as follows:
1. Wealth:When wealth increases, part of the extra wealth may be held as money,
increasing, and total money demand. However with income and the level of transactions
held constant, a holder of wealth has little incentive to keep extra wealth in money rather
21
than in higher-return alternative assets. Thus the effect of an increase in wealth on money
demand is likely to be small.
2. Risk:Holding money isn‟t usually risky as it pays a fixed nominal interest rate (ZERO in
case of cash). However, the demand for safer assets including money may increase if the
risk of alternative assets such as stocks and real estate increases greatly.Therefore, money
demand in the economy increases with increased riskiness.
However, money doesn‟t always carry a low risk. In a period of erratic inflation, even if the
nominal return on money is fixed, the real return on money (the nominal return minus
inflation) may become quite uncertain, making money risky. Money demand then will fall as
people switch to inflation hedges (assets whose real returns are less likely to be affected by
erratic inflation) such as gold, consumer durable goods, and real estate.
3.Liquidity of Alternative Assets:The requirement to hold money reduces when the

conversion of alternative assets into cash becomes more quick and easy. The alternative to
money has become more liquid in financial markets in recent years due to the joint impact
of deregulation, competition and innovation. The introduction of individual cash
management accounts hasallowed individuals to switch wealth easily between high-return
assets, such as stocks, and more liquid forms. As alternative assets become more liquid, the
demand for money declines.
4. Payment technologies. Money demand also is affected by the technologies available

for making and receiving payments. For example, the introduction of credit cards allowed
people to make transactions without moneyat least until the end of the month, when a
cheque must be written to pay the credit card bill. Automatic teller machines (ATMs)
probably have reduced the demand for cash because people know that they can obtain cash
quickly whenever they need it. In the mid 2000s, check clearing becomes very efficient,
thanks to the check 21 law, which allowed for electronic clearing of images of checks. This
process greatly reduces the costs of clearing cheques and allows them to clear much more
quickly than before. In the future more innovations in payment technologies undoubtedly
will help people operate with less and less cash. Some experts even predict that ultimately
we will live in a “cashless society” in which almost all payments will be made through
immediately accessible computerized accounting systems and that the demand for cash will
be close to zero.
5.Elasticities of Money Demand: The theory of portfolio allocation helps economists

identify factors that should affect the aggregate demand for money. However, for many
22
purposes-such as forecasting and quantitative analyses of the economy- economists need to

know not just which factors affect money demand but also how strong the various effects
are. This information can be obtained only through statistical analysis of the data.
Over the past three decades, economists have performed hundreds of statistical studies of
the money demand function. The results of these studies often are expressed in terms of
elasticities(income & interest elasticities), which measure the change in money demand
resulting from changes in factors affecting the demand for money.
(A) Income elasticity of money demandis the percentage change in money demand
resulting from a 1% increase in real income. Thus, for example, if the income elasticity of
money demand is 2/3, a 3% increase in real income will increase money demand by 2%
(2/3X3%=2%).
(B) The interest elasticity of money demandisthe percentage change in money demand
resulting from a 1% increase in the interest rate. If interest rate increases from 5% to 6%,
it is not 1% increase in interest rate rather it is 20% increase in the return. So this has to
be kept in mind while dealing with interest elasticity of money demand.
Measures of Money
1. M1- transactions money
If we take the value of all currency(including coins) held outside of bank vaults and add to it
the value of all demand deposits, traveler‟s cheques, and other checkable deposits, it is
defined as the, M1, or transactions money (i.e.) this is the money that can be directly used
for transactions to buy things. It is also known as narrow money.
M1= currency held outside banks+ demand deposits +traveler’s cheques
+othercheckable deposits ……………….(8)
A checkable deposit is any deposit account with a bank or other financial institution on
which a check can be written. Checkable deposits include demand deposits: negotiable
order of withdrawal (now)accounts, which automatically transfer funds from savings to
checking (or vice versa) when the balance on one of those accounts reaches a
predetermined level.
23
M1 onJune 26, 2000was $1,103.3 billion in US.M1 is a stockmeasure, which is measured at

a point in time.It is the total amount of coins and currency outside of banks and the total
rupee amount in checking accounts on a specific day.
Definitions of Monetary aggregates in India
A. Monetary Aggregates
Weekly compilation
M0= currency in circulation + Bankers with
The RBI + Other Deposits with the RBI
Fortnightly Compilation
M1= currency with the public + Demand Deposits with the banking system +
other’s deposits with the RBI currency with the public + current deposits with the
banking system+ demand liabilities portion of saving deposits with the banking
system + other deposits with the RBI
M2=M1+ time liabilities portion of saving deposits with the banking system +
certificates of deposit issued by banks + term deposits (excluding FCNR
(B)deposits) with a contractual maturity of up to and including one year with the
banking system = currency with the public + current deposits with the banking
system + saving deposits with the banking system + certificates includingFCNR
(B)deposits) with a contractual maturity up to and including one year with the
banking system + other deposits with the RBI
M3=M2 +Term deposits (excluding FCNR(B) deposits) with a contractual maturity

of over one year with the banking system + call borrowing from non-depository
financial corporation by the banking system
B. Liquidity Aggregates
Monthly compilation
L1=M3+ all deposits with the post officer saving banks (excluding national
savings certificates)
L2=L1+Term deposits with term lending institutions and refinancing institutions 24
(FIS) + Term borrowing byofFIs+
Institute certificates
Lifelong of depositofissued
Learning, University Delhi by FIIs.
L2=L1+ public deposits of non-banking financial companies.
2. M2:
If we add near monies, close substitutes for transactions money, to m1 we get m2, called as
the broad money because it includes not–quite-money monies such as saving accounts,
money market accounts, and other money‟s.
M2=M1+saving accounts+money market accounts
+other near money ………………(9)
On June 26, 2000, M2 was $4,778.2 billion, considerably larger than the total M1 of $
1,103.3 billion. The main advantage of looking at M2 instead of M1 is that M2 is sometimes
more stable. When banks introduced new forms of interest-bearing checking accounts in
the early 1980s, M1 shot up as people switched their funds from savings accounts to
checking accounts. However, M2 remained fairly constant because, the fall in saving
account deposits and the rise in checking account balances were both part of M2, canceling
each other out.
One of the very broad definitions of money includes the amount of available credit on credit
cards (your charge limit minus what you have charged but not paid) as part of the money
supply.
SUMMARY
 Money is anything that serves as medium of exchange. Today almost all currency is
fiat money.Fiat money is neither convertible into anything nor has any face value but
it is yet valuable because it is accepted by convention and in law in payment for the
purchase of goods or service and for the discharge of debts.
 Money acts as a medium of exchange and can also serve as a store of value and a
unit of account.
 There is direct and one to one relationship between quantity of money and prices in
the economy. This is known as Quantity Theory of Money.
 Public hold cash or demand money for three reasons: for making day-to-day
transactions, saving it for the bad days and as an alternative to holding bonds.
 Money demand is function of real income (nominal income and prices) and interest
rate. Other factors that determine money demand are wealth, risk, liquidity of
25
alternative assets, payment technologies, and income and interest elasticity of

money demand.
 Money supply in an economy is at a date as it is a stock variable. If we just add cash
in circulation and demand deposits we get a narrow measure of money(M1) and if
near monies are also added to M1 then we get, broad measure (M2).
EXERCISES
SHORT ANSWER QUESTIONS
Q1. Which of the functions of money are satisfied(medium of exchange, unit of account and
store of value) by the following items?
a. Credit Card
b. Subway token
Q2. State whether you agree or disagree with the following statement and explain why:
a. If money supply increases by 10%, overall prices changes by less than 10%.
b. If an economy is experiencing inflation, then it is better to hold money balances.
c. Higher real income means a greater demand for money.
LONG ANSWER QUESTIONS
Q1. Describe the functions of money.
Q2. What is fiat money? How is it different from other types of money in early age?
Q3. Write the Quantity equation and explain it.
Q4. Define velocity. Discuss the role of velocity in the quantity theory of money.
NUMERICALS
Q1. Consider the following money demand function for an economy
Md= 1000+ 0.25y – 1000 i
a. Suppose that P = 100, Y = 1000 and i = 0.10.Find real money demand, nominal
money demand and velocity.
b. If suppose, price level doubles from p = 100 to p = 200 find real money demand,
nominal money demand and velocity.
26
Q.2 a. If income elasticity of money demand is 5 and if income charges by 3 % by

how much would money demand change?
b. If interest elasticity of money demand is 1/5 and if interest charges from 5% to

6% by how much would money demand change?
GLOSSARY
 Money:Money is an asset that is widely used and accepted as payment. Money

functions as a medium of exchange, a unit of account and a store of value.
 Fractionally backed money:When central banks or gold smith issue currency more
than the reserves or gold in their vault, money is known to be fractionally backed.
 Seignior age:Seignior age is the revenue that the government raises by printing
money.
 Fiat money:Fiat monies are intrinsically worthless and it is accepted by convention
and in law in payment for the purchase of goods or services and for the discharge of
debts.
 Quantity Theory of Money: Quantity Theory of Money assumes that velocity of stable
and production function determine real GDP and concludes that rate of growth of
money determines the inflation rate.
 Velocity: The income velocity of money tells us the number of times a rupee bill
enters someone‟s income in a given period of time.
 Money demand:The amount of wealth what everyone in the economy wishes to hold
in the form of money balances is called the demand for money.
 Narrow money: Sum of value of all currency(including coins) held outside of bank
vault, value of all demand deposits, traveler‟s cheques and other checkable deposits
is known as narrow money.
 Broad money: If we add near monies to m1 we get the broad money because it
includes not–quite-money monies such as saving accounts, money market accounts,
and other money‟s.
27
REFERENCES

2007.

edition, 2011.
3. Statistical Outline of India 2012-13, TATA services ltd.
4. www.rbi.org
5. Dror Goldberg, Famous Myths of "Fiat Money",Journal of Money, Credit and Banking, Vol.
37, No. 5 (Oct., 2005), pp. 957-967
28
Credit Creation and Monetary Policy
Semester-I
Unit-III
Lesson: Credit Creation and Monetary Policy
College/Department: RajdhaniCollege, University of Delhi
1
Table of Contents
2. Introduction
3. Credit Creation
a. Central bank in an economy
b. Reserves
c. Ratios approach to credit creation process
d. Cash drain in process of credit creation
e. Money multiplier
4. Monetary policy
a. Open market operations
b. Reserve requirement
c. Repo and Reverse Repo rate
d. Bank rate
5. Summary
6. Exercises
7. Glossary
8. References
Learning outcomes
a) Acquaint with therole of a central bank in an economy.
a) Explain the fractional reserve banking system.
b) Understand the process of credit creation in an economy.
2
c) Computemoney multiplier, given the data.
d) Tell about usefulness and impact of monetary policy in an economy.
e) List various tools of monetary policy.
INTRODUCTION
Banks are the financial intermediaries in the economy whose primary task is the acceptance
of deposits and provisioning of loans. The questions that usually come to one’s mind is-how
all banks operate; who controls all the banks;whatquantum of accepted deposits is loaned
out; who decides all that; how does it impact economy?
Reserve Bank of India, RBI is an apex bank controlling all the operations of all the
commercial banks in the economy. RBI controls money supply & credit availability in the
economy. After the recession of 2008, RBI has been consistently loweringCRR& repo rate.
Why does RBI takesuch steps? The answer to this is, RBI injects money in the system but
one likes to know how does it materialize that?
This chapter is an attempt to answer all the above stated questions. It focuses on credit
creation by RBI in the economy in Section 1and the usage of different macroeconomic
tools to inject or eject money from the economy in Section 2.
CREDIT CREATION
Commercial banks are different from other financial institutions as they have the ability to
create credit in the economy. They accept deposits from public- a part of which is loaned
out and the remaining is conserved as deposits.Banks are in reality capable of providing
more loans than the amount of cash held by them. The questions that needtobe answered
are- what proportion of the total deposits of the bank is to be given as loans and what ratio
is to be preserved as cash by the bank; how can banks expand loans by more than the
quantity of cash they have; what mechanism is at work?
We would try to study the mechanism of credit creation in an economy in this section.
3
Central bank in India
Reserve Bank of India, the central bank, controls money supply in India in two ways.
Firstly, RBI prints money and directly controls money supply in the economy and
Secondly, RBI uses monetary policy as a tool to control money supply indirectly.
Along with the Central Bank, it also depends on the Depository Institutions (i.e.)
Commercial banks and public thatholds money either as cash at hand or deposits in bank.
Should access to credit be a Right?
2006 peace Nobel Prize winner M.Yunus, in his effort to create economic & social development
from below, proclaims that credit is directly instrumental to economic development, poverty
reduction and improved welfare of all citizens, and hence credit should be a human right. Yunus
considers right to credit to be moral one, based on the fact that without access to opportunities
that credit can provide there is little chancethat the poor will be able to improve their position.
But liberation approach to human rights; opposesYunus’ focus on moral consequences of

financial exclusion. Libertarian stresses that individual rights of leader are violated. A large
group of academics & experts challenge the urgent need of credit for all as it does not great a
good or service. Maren-Hudon provides an alter rate approach of a goal– right system to credit.
Behavior of RBI in macro economy
4
Reserve Bank of India comprises oftwodepartments viz. Issue Department & Banking
Department. Issue Department relates to the sole function of currency management.
Banking Department deals with rest of the banks in the country and provides an impact of
all functions of the Reserve Bank.
RBI’s Issue Department’s balance sheet is as follows:-
Reserve Bank of India
Balance Sheet as on 30th June 2012
(Rs. Thousands)
Liabilities Assets
Notes held in Gold win & bullion 760,096797
banking Foreign Securities

89,169 10261,966,851
Department
11022,063,648
Notes in 11034, 645,327 Rupee coin 2,206,548
circulation
11034,734,496 GOI Rupee securities 10,464,300
Total Notes
Issued
5
Total liabilities 11034,734,496 Total Assets 11034, 734, 496
Source: RBI’s website
On the right hand side of the balance sheet arethe Reserve Bank’s assets- what it owns.
Its assets comprise of gold coins & bullion, foreign securities, rupee coins, government
securities & commercial paper. On the left hand side is RBI’s liabilities- what it owes to
others. Currency issued by RBI – either held by public or in the Banking Department is a
debt obligation of the RBI.
Likewise, in Banking Department’s balance sheet, assets are securities purchased &
investments made and notes held by it (Rs. 89,169 as shown in balance sheet of Issue
department). Liabilities comprise of reserve deposits. These reserve deposits are liabilities
of Reserve Bank and assets of commercial banks as these are deposit accounts at Reserve
Bank held by commercial banks.
For simplification, from hereon we will assume no difference between Banking and Issue
department and combined balance sheet will be considered for the two departments. Let us
set the following example that would be applied to almost any currency and Central bank.
Central Banks Balance Sheet*
Liabilities Rs. Assets Rs.
Currency held by non bank public 700 Securities 900
Vault cash held by bank 100
Reserve deposits
200 Gold 100
Total liabilities 1000 Total assets 1000
6
*The above balance sheet contains selected items, which would be required for further
analysis.
The sum of reservedeposits and currency (including both currency held by public and vault
cash held by banks) is called as the monetary base or also known as high-powered money
denoted by H.
H = C+R……………………….…………….(1)
Where,
H is high-powered money
C is currency
R is Reserve deposits
Next, consider the balance sheets of all commercial banks in the private sector. Supposeall
banks are combined together and their consolidated balance sheet looks like the following:-
Consolidated Balance Sheet of Banks
Deposits 3000 Vault cash 100
Reserve Deposits 200
Loan 2700
Total Liabilities 3000 Total Assets 3000
7
Banksassets consist of vault cash & reserve deposits both of which appeared as liabilities on
central banks balance sheets plus loans that banks have extended to the public. Banks
liabilities consist of deposit accepted from the public. The money you deposit in your bank
account is your asset while a liability for the particular bank.
RESERVES
Out of total deposits of Rs.3000, banks kept Rs.200 as reserve deposits &Rs.100 as vault
cash to meet the demands for withdrawals by depositors. This is known as bank reserves.
It is 10 % of the total deposits. How one fixes this reserved deposit ratio of 0.10
(=300/3000)?
Why don’t banks keep entire deposits as reserves?Depositors can write cheques of the
amount equivalent to their deposit money or withdraw the entire deposit money. If banks
reserve the entire deposit money, banks are said to be following 100% reserve banking.
But banks anticipate the withdrawal demand by all sorts of depositors and then what
amount would be held as reserve deposits is decided. For example, there are three
depositors viz.A,B& C and each havethesame amount of deposits in their respective
accounts. A withdraws his entire salary every month, B withdraws half of his salary &C
withdraws none. In this case, Banks would decide 0.50as reserve deposit ratio so as to
meet the requirement of their depositors. In this case, a generalization for all customers is
made & then rest of the money is lent out by banks.
Bank Runs
If suppose there is a spread of rumor that a bank would not be able to honor cash
requirement of their depositors; then all depositors would rush to the bank so that they do
not lose on their money. They do not want to lose on their money. Since this is known as
8
run on banks follow fractional reserve banking system; they would not be able to actually
honor all withdrawal requirements. Final outcome would be panic in the economy. Bank
runs were evident at the time of great depression 1929 & Great Recession 2008.
As in our example, reserve deposit ratio is 0.1, which is less than 1; this is known as
fractional reserve banking system.Every bank follows fractional reserve banking deposit
because keeping 100 % of their deposits would mean they perform a function of safe vault
and would earn no profit or a very low profit of central bank given them some interest rate
on such reserve deposits. And in an economy such reserve deposit ratios are set by central
bank of an economy.
RATIOS APPROACH TO CREDIT CREATION PROCESS
Suppose there are only private banks is an economy that follows fractional reserve banking
system with reserve deposit ratio of 0.10. Suppose one of the banks, i.e.bank A accepts
deposit equivalent to Rs. 100 & keeping reserves of 10 %; bank loans out the rest.
Therefore, Bank A’sBalance Sheet would look like the following:
Bank A’s balance sheet
9
Deposits 100 Reserves Loans 10
90
Now this Rs.90, which is loaned out to anybody in the non-bank public is deposited in
borrower’s bank, say, bank B.
Bank B’s balance sheet after accepting deposit and lending out money to public after
keeping reserves would appear like the one below:
Bank B’s Balance sheet

Deposits 90 Reserves 9
Loan 81
This process of credit expansion will continue, as now this Rs.81 would be deposited in next
borrower’s bank & so on. Let’stry to figure out what will be the amount of deposits & loans
in the end?
10
Bank Deposit Reserve (rD) Loan/Credit ( D-rD)

A 100 10 90
B 90 9 81
C 81 8.1 72.9
D 72.9 7.29 64.61
: : : :
: : : :
Total 1000 100 900
Total of deposits = Rs.(100+90+81+72.9+……………)
=Rs. (100+0.4(100)+0.9 (0.4x100)
+0.4(0.4x0.9x100)+…………)
= Rs.100/1-0.4 = Rs.100/0.1 (by G.P.’s formula adding upto infinity)
= Rs.1000
Total of loans = (90+81+72.9+64.61…….)
= (90+0.9x90) +0.9(0.9x90)
= 90/1-0.9 (by G.P.’s formula adding upto infinity)
= 900
Total of Reserves =Rs. (10+0.9x10+0.9 (0.9x10)+……….)
= Rs. 10/1-0.9( by G.P.’s formula adding upto infinity)
=Rs. 100
By generalizing the totals, we get:
Total Deposits = First deposit/r
So here 1/r is the multiplier (Deposit & credit multiplier).
11
One bank in a multi-bank system cannot produce a large multiple expansion of deposits
based on an original accretions of each when other banks do not also expand their deposits.
In the banking system in this example, a multiple increase in deposit money is created
when all banks with excess reserves (i.e. money left after keeping a required reserve ratio
of 0.1) expand their deposits in step with each other.
CASH DRAIN IN PROCESS OF CREDIT CREATION
In the above setup, the implicit assumption was that depositor does not wish to hold cash
out of the deposits. But in reality, public wishes to hold a proportion of cash, say, equal to
10 percent of the size of its bank deposits. How does this impact the process of credit
creation?
As we already know, high powered money, is sum of currency & reserves (R)
H = C+R ……………………………………………………….(1)
which means, that total cash is either held by the banks or the public. Let required reserve
deposit ratio be r. Then,
R=rD…..…………………………………………………….(2)
Where,
D is the total deposits.
Let the public hold a fraction of its cash in banks,
C = bD ………………………………………………………….(3)
Substituting equations (2) & (3) in (1) gives:
H = bD+rD
12
&solving for D yields
D= H/b+r ………..…………….…………………………………(4)
In Equation (4) deposit multiplier becomes1/b+r (in case of cash drain), which is to be
compared to previous deposit multiplier, 1/r where cash deposit ratio was assumed zero. In
equation (4) if cash deposit ratio is assumed to be ZERO, deposit multiplier again becomes
1/r. A positive value of b lowers the increase in deposits, as it is cash drained out of
expansion process.
MONEY MULTIPLIER
The total money supply M is an economy is sum of deposits& Currency i.e.
M = C+D …………………………………………………………….(5)
M=bD+D{from (3)}
M=D(1+b)
M=(1+b)/ (b+r H){from (4)}
So Money supply in an economy is linked to monetary base, H by following equation:
M= (1+b) / (b+rH)
1+𝑏
Where𝑏+𝑟 is the money multiplier.
Money Multiplier in India
Money Multiplier is calculated by taking a ratio of broad measure of money M3 to

narrow measure of money M0.On Dec. 30, 2011; M3 was Rs.71, 986.8 billion and M0
was Rs. 14,200.5 billion.
𝑀3 Rs .71,986.8 billion
So Money Multiplier, m= = = 5.0693(approx.) 13
𝑀0 Rs .14,200.5 billion
The following graph shows the money multiplier for the period April 2008 to April
The size of the money multiplier is greater, the smaller is the banks reserve deposit ratio,r
and the smaller is the cash deposit ratio, b. Both b & r are the drains in the deposit or credit
expansion process.
MONETARY POLICY
14
Monetary Policy is the policy of the Central bank of an economy that deals with the quantity
of money to be supplied in the economy.Monetary Policy is an important tool to affect macro
economy. Money supply has a direct one to one relationship with prices in the economy
(result of quantity theory of money), which has an implication that, if Central Bank wishes
to contain inflation rate in the economy, it can be achieved with the help of changing the
monetary base of the economy.One of the primary objectives of monetary policy is to
contain inflation rate.
Considering money supply and money demand as a function of interest rates, money
demand slopes downward to the right while money supply is vertical. Money demand is
negatively related to the interest rate as was observed is last chapter. Money supply is
determined by central banks decision of high – powered money so it is fixed at some given
level, irrespective of interest rates. For this consider the following figure.
15
If Central Bank decides to increase the money supply in the economy; then m shifts to the
right from m0 to m1 and equilibrium interest falls from ro to r1 as shown in the above
figure.
This fall in interest rate induces investment in the economy. From our knowledge from
chapter on National Income Accounting; investment is a part of National Income is known.
So as money supply in an economy expands, interest rate falls which induces investment in
the economy and henceforth national income increases. So this could be the second
objective of Monetary Policy.
As discussed in the last section of money multiplier, money supply is determined by three
factors: H (High powered Money), r (reserve deposit-ratio) and b (cash deposit ratio).
Central bank can change the monetary base of the economy or could change the
requirement for reserve deposit.
MONETARY POLICY TOOLS
16
Central banks control money supply in the economy through the following policy
instruments:-
1. Open Market Operations

2. Reserve Requirements
3. Repo and Reverse Repo rates
4. Discount / Bank rate
1. Open market operations
If RBI purchases securities from private investors, then theygetcurrency or deposit with
them as a result of this transaction, which means that it increases the monetary base and
thus the money supply. This purchase of assets is known as open market purchase.
Thesale of assets to the public by the Central bank is called as the open market sale. It
reduces the monetary base and the money supply. Open market purchases and sales
collectively are called as open –market operations.
For example, if RBI purchases assets worth Rs.100cr. then monetary base increases by
Rs.100 cr. Assuming a money multiplier of 10, total money supply increases by Rs.1000cr.
in the economy due to RBI’s open market purchases.
Understanding its mechanism
17
Panel 1:
Central bank Commercial banks Shyam.public
Liabilities Rs. Assets Rs. Liabilities Rs. Assets Rs. Liabilities Rs. Assets Rs.
Reserves 20 Securities 100 Deposits 100 Reserve 20 Debts 0 Deposits 5
Currency 80 Loans 80 Net worth 5
Total 100 Total 100 Total 100 Total 100 Total 5 Total 5
Panel 2 :
Liabilities Rs. Assets Rs. Liabilities Rs. Assets Rs. Liabilities RS. Assets Rs.
Reserves 15 Securities 95 Deposits 95 Reserve 15 Debts 0 Deposits 0
Currency 80 Loans 80 Net 5 Securities 5

worth
Panel 3 :
Liabilities Rs. Assets Rs. Liabilities Rs. Assets Rs. Liabilities Rs. Assets Rs.
Reserves 15 Securities 95 Deposits 75 Reserve 15 Debts Net 0 Deposits 0

Loans worth
Currency 80 80 5 Securities 5
Table 1: Open Market operations
Let us look at Table1 to understand how open market operations affect money supply in the
economy. In panel 1,Central bank hasRs.100 of government securities. Its liabilities
consist of Rs.20 of deposits and Rs.80 of currency. With required reserve ratio of 0.2, Rs.20
of reserves can support Rs. 100 (=20/0.2) of deposits in commercial banks. Panel 1 also
shows Shyam’s financial position.
Now imagine that central bank decides to make open market sale of securities worth Rs. 5
to private investor Shyam. Shyamwrites a cheque to the Central bank to complete this
transaction. Central bank’s reserves are reduced byRs. 5. (& reserves of commercial banks
too). Such changes are shown in Panel 2.
18
The story doesnot end here. Since reserves are reduced to Rs.15 which now could support
deposits of Rs.75 (=15/0.2), the final equilibrium of loans have been reduced to
Rs.60.Banks don’t call in loans but rather loans and deposits would be reduced by slowing
down the rate of new lending as old loans come due and are paid off. Deposits have
changed by Rs.25(from Rs.100 to Rs.75). In this example, change in money (Rs25) is
equal to Money multiplier (5) times the change the reserves (Rs.5). Money supply defined
by sum of deposits and currency decreased from Rs.100 to Rs.155.
Required reserve ratio
Changes in the reserves (discussed in the last section) bring changes in the money supply.
When any Central bank changes the required reserve ratio in the economy, money
multiplier changes and henceforth money supply changes.
Suppose central bank announces that required reserve ratio is reduced from 20 percent to
12.5 percent. The changes in the money supplies are shown in table2.
Initially, when required reserve ratio is 20%, the balance sheets of central bank and
commercial banks are shown in Panel 1 in Table 2. When required reserve ratio is lowered
to 12.5%, then out of Rs.500 of deposits only Rs.62.5 might be kept as reserves and extra
Rs.37.5 must be lent out which again creates deposits of Rs.37.5 times the money
multiplier (8) i.e. deposits of Rs. 300 more are created.
19
Panel 1 : Required Reserve Ratio = 20%
Central bank Commercial banks
Liabilities Rs. Assets Rs. Liabilities Rs. Assets Rs.
Reserves 100 Securities 200 Deposits 500 Reserve 100

Loans
400
Currency 100
Total 200 Total 200 Total 500 Total 500
Panel 2 : Required Reserve Ratio = 12.5%
Central bank Commercial banks
Liabilities Rs. Assets Rs. Liabilities Rs. Assets Rs.
Reserves 100 Securities 200 Deposits 800 Reserve 100

Loans
Currency 100 700
Total 200 Total 200 Total 800 Total 800
Table 2: changes in Reserve Deposit Ratio
So new deposits that could be supported with 12.5% required reserve ratio becomes Rs.800
and reserves equal 12.5% of deposits (Rs.800) i.e.Rs.100. Money supply has increased
from Rs.600 (Rs.100 currency R.500 deposits) to Rs.900 (Rs.100 currency and Rs.800 of
deposits).
Cash Reserve Ratio, CRR is the amount of funds that the banks have to keep with the
RBI(Central bank of India). Statutory liquidity ratio, SLR refers to the amount that
commercial bank requires to maintain in the form of gold or govt. securities before
providing credit to customers.
Bank Rate
20
Bank rate, also referred as Discount rate, is the rate of interest, which a central bank
charges on the loans that it advances to the commercial bank. When banks borrow, money
supply increases. Central banks’ lending of money to banks is called discount window
lending. The higher the discount rate, the higher the cost of borrowing and the lesser the
borrowings that the banks would want to do. If central bank wants to curtail the growth of
money supply, it can raise the discount rate and discourage banks from borrowing from it,
restricting the growth of reserves (and ultimately deposits).
CURRENT KEY POLICY RATES IN INDIA
CRR : 4%
SLR : 23%
Bank rate : 8.25%
Repo rate : 7.25%
Reverse repo rate : 6.25%
Repo and Reverse Repo rate
Repo is a repurchase agreement, is the sale of securities to central bank together with an
agreement for the commercial banks to buy back the securities at a later date. The
repurchase price should be greater than the original sale price, the difference effectively
representing interest is called repo rate. Reverse repo is the sale of securities by commercial
banks together with an agreement for the central bank to buy back securities at a later
21
date. An increase in reverse repo rate can prompt banks to park more funds with the central
bank to earn higher return on idle cash. It is also a tool, which can be used by the central
bank to drain excess money out of banking system.
SUMMARY
 Banks create money by making loans. When a bank makes a loan to a customer, it
creates a deposit in that customer’s account. This deposit becomes part of money
supply. Banks can create money only when they have excess reserves and credit
creation process is successful only when all banks loan out their excess reserves.
 Money supply in the economy is determined by monetary base times the money
multiplier. Money multiplier is equal to 1/ required reserve ratio.
 Central bank pursues monetary policy and controls money supply in the economy.
Central banks can either monetary base or the multiplier by its policies.
 Central banks have following tools to control the money supply: (1) through Open
Market Operations (the buying and selling of already existing government
securities); (2) by changing the required reserve ratio( reducing this ratio increases
multiplier); (3) by changing discount rate (raising discount rate decreases money
supply) and (4) by changing repo and reverse repo rate.
EXERCISES
Short answer questions
Q1. What happens to money supply in following situations?
a. RBI buys bonds in the open market.

b. RBI increases the reserve requirement.
Q2. Decide on whether RBI has taken correct step as per the requirement or not? What
would be the outcome?
a. During period of rapid inflation, RBI decreases the reserve requirement.
22
b. During period of rapid real growth, RBI should inject money in the economy.
Long answer questions
Q1. Explain how banks create money.
Q2. What are the ways in which a central bank can influence the money supply?
Q3. What would happen to money supply if general public chose to hold (a) no cash, (b)
no bank deposits?
Q4. What is money multiplier? What all factors determine its value?
Numericals
Q1.Look at the RBI’s balance sheet, which is as follows:

Currency 150 Securities 350
Reserves 200
Total 350 Total 350
Total deposits with the commercial banks is of Rs. 1500.
Calculate:
a. Reserve deposit ratio.

b. Value of money multiplier.
c. If reserve deposit ratio changes to 12.5% then what will be the impact on
macroeconomic variables.
23
GLOSSARY
 Bank Reserves: Liquid assets held by banks to the demands for withdraws by
depositors are called Bank reserves.
 Reserves deposits ratio: Fraction of banks outstanding deposits that is kept as
reserves is known as reserve Deposits ratio.
 Fractional Reserve Banking: If reserve deposit ratio is less than 1 ie reserves are a
fraction of deposits then such a banking system is known as fractional reserve
banking.
 Money Multiplier:Money multiplier is the multiple by which the total supply of money
can increase for every unit increase in reserves. The money multiplier is equal to 1/
required reserve ratio.
 Open market Operations: If central bank purchases from or sells to, private investors
in the economy; money supply increase or decreases, respectively. These open
market purchases and sale collectively is known as open market operations.
 Cash Reserve Ratio: Cash Reserve Ratio is the amount of funds that the banks have
to keep with central bank
 Statuary Liquidity Ratio: Statutory liquidity ratio refers to amount that commercial
bank requires to maintain in form of gold or govt. securities before providing credit
to customers.
 Repo Rate: Repo (Repurchase) rate is the rate at which the RBI lends shot-term
money to the banks.
 Reserve Repo Rate:Reverse Repo rate is the rate at which banks park their short-
term excess liquidity with the RBI.
 Bank Rate:Bank rate is the rate of interest which a central bank charges on the loans
and advances to a commercial bank.
REFERENCES

2007.
24

edition, 2011.
3. www.epw.in
4. www.rbi.org
5. MarekHudon, Should Access to Credit Be a Right?, Journal of Business Ethics, Vol. 84, No.
1 (Jan., 2009), pp. 17-28
25
Inflation and its social costs
Semester-I
Unit-IV
Lesson: Inflation and its Social Costs
College/Department: RajdhaniCollege, University of Delhi

Table of Contents
2. Introduction
3. Inflation
a. Meaning and causes
b. Measuring inflation
c. Indicators of inflation
d. Measures & trends of inflation in India
4. Social costs of inflation
a. Expected inflation
b. Unexpected inflation
5. Hyperinflation
a. Meaning and causes
b. Costs of hyperinflation
6. Summary
7. Exercises
8. Glossary
9. References
Learning outcomes
1. Define the term inflation.
2. Explain the causes of inflation.
3. List various measures of inflation.

4. Construct price index, given the data.
5. Notifythe recent trends in inflation in India.
6. List various costs of inflation to an economy.
7. Explain the consequences of hyperinflation in an economy.
Introduction
The financial year 2010-11 started with a headline of 11% inflation in April 2010. And
inflation has been a disturbing issue in the Indian economic scene in the last few years.
Why all economists and policymakers are concerned about curbing inflation in the economy?
Even a layman understands that inflation i.e. rise in prices, makes him poorer. Inflation can
be seen as a devaluing of the worth of money or fall in the purchase power of the
consumer.
Inflation is universal because money supply needs to be raised with time, cost of inputs and
wages increases. Is it normal? When does inflation become a cause of worry? We have
discussed in last chapter that central bank has macroeconomic tools to contain inflation.
But when would it be required?
This chapter is an attempt to understand the phenomenon of inflation, its measurement

and recent trends in Section 1. Social costs of inflation will be discussed in Section
2and the causes and costs of hyperinflation will be studied in Section 3.
Inflation
Inflation is a persistent increase in the general level of prices. Milton Friedman, the Nobel
Prize winning economist said: “inflation is always and everywhere a monetary
phenomenon”. By saying this he meant that inflation always moves up as and when money
supply is more than the growth of economy for a period of time. Inflation has also been

defined as “too much money chasing too few goods”. This is what monetarists think. But
inflation rate may be as a result of demand-pull and / or supply shock. Let’s study the
various other causes of inflation.
Causes of inflation
An economy can experience inflation due to the following reasons:-
a. Demand –Pull Factors
Prices may rise due to excessive demand. An economy can experience excessive
demand due to increase in the population, high rate if investment- which is demand
of capital goods, increase in government expenditure and also due to increasing role
of black money.
b. Cost push factors
Price rise could occur due to increase in particular prices or wage rates being passed
round the economy. These include rise in wages, profit margins & increasing costs of
inputs. Also, fresh taxation also raises the price level.
Impact of crude oil prices increase on global commodity prices
Crude oil price changes are reason for concern as it affects the prices of other
commodities in many ways. Prices of following goods changes due to crude oil
price change.
 Since crude oil is used in energy generation, prices of energy-intensive

commodities are affected, for example, metals.
 Since crude oil is used as fuel it alters the transport cost of commodities
especially over long distances.
 Price of inputs used in primary products like fertilizer and fuel changes
due to crude oil price changes, which in turn, changes the price of
primary products.
 Prices of substitutes of crude oil are also affected.
c. Structural rigidities
Permanent rigidities exist in most contemporary industrial societies which could be

interms of localize bottlenecks (rigidities in a particular sector of production and
4

specific rigidities of certain factors). For example, there could be insufficiency of

resources appearing in a key sector (essential raw material, oil for example).
d. Nature of Economy
In an open economy, inflation can be imported by flows of international exchanges.

It could be final goods imported for direct consumption or some raw material to be
used in production.
Measuring Inflation
Inflation plays a vital role in economic policy making, as well as individual decision making.
Consumer Price Index, (CPI) is preferably used as a tool to measure of inflation by
economists, policy makers and consumers. The following exercise will help us to understand
how CPI is constructed and it will make a distinction between the level of a variablei.e. CPI
and the rate of change in the variable i.e. inflation rate. It also helps us to understand what
bias in a variable CPI would be considered?
Consider, seven periods viz. base period and then period 1 to 6. Suppose ina base
period,you haveRs. 30 to be spent on three goods viz.Good 1, Good 2 and Good 3 each
costing Rs.5. Further suppose that aggregation and averaging gives the constituents of a
typical basket of a consumer, which is assumed as follows: 3 units ofGood1, 2 units of Good
2 and 1 unit of Good 3.whereas , good 4 is introduced in period 5.
The following table provides the data on price change of three goods in the economy:-
Good Base Period1 Period2 Period3 Period4 Period5 Period6

Period
Good 1 5 5 5 10 10 5 10
Good 2 5 10 5 5 5 10 10
Good 3 5 5 10 5 10 10 5
Good 4 10 5

Cost of basket will now be calculated with varying prices in each period. Now Price Index
can be constructed by dividing the cost of the goods and services in the representative
basket in the current period by the cost of the same representative basket in the base
period and then multiplying it by 100. This gives the following results:-
Price Index & inflation rates for basket of 3 goods computed for six periods
PERIOD (1) PRICE INDEX (2) INFLATION RATE %(3)

Base year 100 NA
First year 150 50
Second year 117 -22
Third year 133 14
Fourth year 150 13
Fifth year 167 11
Sixth year 183 10
Inflation rates arecalculated by computing percent change in price Index compared to

previous period.
In the construction ofPrice Index (in year 5 & 6), we ignored good 4, which was introduced
in period 5 in market. Similarly Consumer Price Index (CPI) also ignores the introduction of
new products while constructing price index.
Also, a fixed basket was assumed throughout which might not be the case in real life.
When price of good 2 increased in period 1, consumer might substitute Good 2 for Good 1
or Good 3. Does CPI’s limitation of substitution bias lead to overestimation of inflation.
Another important thing to learn from this example is that as prices sometimes decline
between periods like from period 1 to period2: so one gets a negative inflation rate. Fall in
prices is referred to as “Deflation”.

Indicators of inflation
Following indicators are used to calculate and study inflation in an economy.
(a) The Index number of wholesale Prices: it excludes services.

(b) The consumer Price Index (CPI): it includes both consumer goods and services.
(c) The Gross domestic product (GDP) deflator: the deflator is obtained by dividing
the GDP at current prices by the GDP at constant prices. The GDP deflator
indicates growth of GDP in a particular year due to rise in prices (discussed at
length in chapter on National income accounting).
WPI (Whole sale price Index) represents the rate of increase in the wholesale prices of
products. However, what matters to the common man is the consumer price. Though
prices in the wholesale market grow at slower pace about 2-3 percent,the consumer prices
measured in terms of CPI grow at a much faster pace ( about 8-9) percent.
The way the two indices are calculated differ, both in weightage assigned to products as
well as the kinds of items included in the basket of products.
Recent Trends in India
At present, there are five different price indices namely, wholesale Price Index (WPI),
Consumer Price Index for Industrial work (CPI-IW), Consumer Price Index for Urban Non-
Manual Employees (CPI-UNME), the Consumer Price Index for Agricultural laborers (CPI-AL)
and CPI for Rural Laborers (CPI-RL). CPI-IW is the most well known of the consumer price
indicators as it is used for wage indexation.Wholesale Price Index has always continued to
be the most prominent of the headline inflation in the Indian economy because of its weekly
availability. It is an economy wide index, which covers close to 676 commodities.
If one looks at the table below, the value for two indices- WPI and CPI-IW& that of GDP
deflator has been on a consistent rise since 1999- 2000. The figures in parentheses are all
positive indicating that Indian economy is experiencing inflation in the last few years.

In India, inflation is due to both cost-push & demand pull factors. Due to drought of 2002
or bad weather conditions or like in petroleum prices, India experienced inflation in last
decade. According to Economic Survey of 2007-08, inflation in India is a structural as well
as monetary phenomenon.
Year / indies WPI* CPI (IW)* GDP consumption

deflator*
Price indices based on 1999-2000 = 100
1999-00 100.00 100.0 100.0
2000-01 107.1 (7.1) 103.8 (3.8) 103.5 (3.5)
2001-02 111.0 (3.6) 108.3 (5.1) 106.8 (3.2)
2002-03 114.8 (3.4) 112.6(3.8) 109.8(2.9)
2003-04 121.1 (5.5) 116.9 (3.7) 113.8(3.0)
2004-05 128.9 (6.5) 121.4(3.6) 117.0(2.8)
2005-06 134.6 (4.4) 126.8(4.7) 120.5(3.0)
2006-07 141.9 (5.4) 135.3(6.6 126.7(5.1)
Source: Economic Survey 2007-08
*Figures in the parentheses are year on year inflation.
WPI has been on a rise since the first half of 2006. The futures trading system especially in
products like cereals, pulses, milk, sugar & edible oils; is being blamed for the same.CPI–IW
remained high even in 2009-10 ( in double digits from July 2009 to July 2010) . The major
contributors to high CPI –IW inflation were food & housing.
New WPI series comes into effect
A new series of the wholesale price index with 2004-05 was released on September 14, 2010. A
comparison of the weighting diagram and number of commodities between old and new series for
the groups are drawn in the table below:
Weights No. of commodities

Items New series(base New series(base Old series (base:
: 2004-05) : 2004-05) 1993-94)
All commodities 100.00 100.00 676 435
Primary articles 20.12 22.03 102 98
Food articles 14.34 15.40 55 58

5.78 6.63 47 44
Non food &
miner 8
Fuel & power 14.91 14.23 55 41
Manufactured 64.97 63.75 555 318

items
Core Inflation
Core inflation is a measure of inflation, which excludes those items that have volatile price
movement, especially food and energy. Therefore, it is a preferred instrument for designing
long-term policy. Core inflation, which was 0.55 percent in November 2009, reached its
peak in April 2010 at 8.07 per cent.
Cost of Inflation
An economy which is experiencing inflation has to bear many costs and policymakers,
economistsand especially politicians are concerned to make arrangements and take steps to
curb inflation because of public pressure.Inflation is keenly watched and widely debated by
all the stakeholders in the economy as it is considered to be a serious economic problem.
Let’s study what are the costs that an economy has to face in advent of inflation.
The costs of expected inflation
If suppose, every week prices rising by half percent. What would be the cost of such
predictable inflation?
1. Falling purchasing power

When there is inflation it seems at first that now you would be able to command
lesser number of goods. But is it really true? If you pay higher prices for the goods
and services then the seller gets higher income and so do you when you charge
higher price. So now it seems that if nominal incomes keep pace with inflation rate
then fall in purchasing power is just a fallacy. Therefore, inflation itself does not
lower real purchasing power of the consumer.
2. Shoe leather cost
When an economy faces inflation, value of money is eroded. To save on that, public
chooses to keep money in the banks. But how is it ensured that money is not losing
its value.The solution to this isthe interest rate offered on the deposits one’s make.
The nominal interest rate is at which people pay/receive interest payments to/from
the commercial banks. The real interest rate is adjusted nominal interest rate for the
effect of inflation in order to tell usat what pacethe purchasing power of our
deposited money is growing or at least is not eroding.
Real interest rate = Nominal interest rate – inflation
So whenever inflation is prevalent in the economy, nominal interest rate adjusts to

the rate of inflation to keep the real interest rate constant. This adjustment of
nominal interest rate to the inflation rate is known as Fisher effect.
Inflation creates cost on the public with regard to distortion in the amount of money
they should hold. A higher inflation leads to higher interest rate via fisher effect and
also lower real money balances. Then people will hold lower money balance on an
average and this would mean they would make frequent trips to the bank to
withdraw money. They might withdraw Rs.1000 instead of Rs.2000 once a week.
This cost of wearing out of one’s shoes (while making frequent trips to banks) is
metaphorically called the shoe leather cost of inflation.
3. Menu costs
Inflation also arises because high inflation causes firms to bring changes in their
prices printed in menu cards more often. This procedure iscostly as it requires print
and distribution of a new catalog. These costs arose due to high inflation are called
10

menu costs, because the firms often revise the price list in their menu cards
whenever the rate of inflation is high.
4. Inflation induced Tax distortions
Another factor which add to the inflation because some provisions in the tax code do
not consider the effects of inflation. One of the classic examples of this is when tax
laws fail to deal with inflation in case of tax on capital gains. Suppose you buy a
stock today for say Rs.100 and sell it a year from now for Rs.115. It seems
reasonable for the government to tax your capital gain of Rs.15(Rs.115-100).
Suppose again that your economy has inflation rate of 15% over the same period.
Then, in that case, you have not earned any real income from this investment. But
tax code fails to take into account the effect of inflation and government levies a tax
on nominal rather than real income earned. This is how; inflation distorts tax
imposition and individual’s liability.
5. Relative – price variability and misallocation of resources
Inflation arises due to the fact that since firms face menu costs; they change prices
frequently, which brings variation in relative prices. For example, McDonalds revises
its menu prices in the month of January every year. If the rate of inflation is zero,
then the firm’s prices relative to the overall price level are constant over the year.
But if inflation is 0.5 percent per month, then at the end of the year firm’s relative
prices fall by 6 percent. Firm’s prices would be relatively high early in the year and
sales tend to be low. Prices would be relatively low later in the year and sales tend
to be high. Hence, inflation not only brings variability in relative prices but it also
allocates the resources inefficiently.
6. Inconvenience
Another cost of inflation is the inconvenience of living in a world where prices are
changing and brings changes in the value of rupee. Money is used as a yardstick for
measuring economic transactions and therefore, when an economy experiences
11

inflation, that yardstick is changing in length. Lets consider an example how

changing price level complicates one’s planning about how much to save for the
future. If suppose, prices were to remain same even after thirty years from now, i.e.,
when an individual retires. Then, a rupee saved today and invested at a fixed
nominal interest rate would yield fixed rupee tomorrow. If, economy experiences
inflation then real value of the investment would change and retiree’s living standard
depends on the real value of the rupee. Now, individual is in a flux what to save for
the retirement; since inflation could alter individual’s financial plans.
The costs of unexpected inflation
Effect of unexpected inflation in terms of costs is more destructive than anticipated and
regular inflation. Unexpected inflation leads to arbitrary redistribution of wealth in an
economy. It can be understood better by seeing how it works by examining long-term
loans. Largely most loan agreements have a fixed nominal interest rate, which is sum of
real interest rate, and an expected rate of inflation for the same term period. If inflation
turns out to be different from what was expectedby both the parties then the ex post real
return that the debtor pays to the creditor is different from what both parties expected. The
debtor gains and the creditor looses if inflation is more than expected and inversely if
inflation is lower than expected, the creditor gains and the debtor looses. Suppose loan
agreement states that a sum of Rs.100 is provided at the rate of 10% (rate of expected
inflation) for a year. Suppose actual inflation turns out to be 15%, debtor gains as he/she
repays the loan with less real amount. On the other hand, if inflation turns out to be 5%,
creditor gains because the repayment is worth more than expected in real terms.
The free silver movement, the election of 1896, and the Wizard of Oz
The redistributions of wealth caused by unexpected changes in the price level are often
a source of political turmoil, as evidenced by the Free silver movement in nineteenth
century. From 1880 to 1896 the price level in the United States fell 23 percent. This
deflation was good for creditors, primarily the bankers of the Northeast, but it was bad
for debtors, primarily the farmers of the south and west. One proposed solution to this
problem was to replace the gold standard with the bimetallic standard, under which
12
both gold and silver can be minted into coins. The move to bimetallic standard would
increase the money supply
Instituteand stop theLearning,
of Lifelong deflation.
University of Delhi
The silver movement dominated the presidential election of 1986. William McKinley,
the Republican nominee, campaigned on a platform of preserving the gold standard.
Individuals with fixed pensions are also hurt by unexpected inflation. Since, workers and
firms decide on a fixed nominal amount of pension to be given when the worker retires. As
explained in our previous example, worker loses when inflation is high because he fixed
pension that has a lower worth when he retired. Like any debtor, firm will be looser if
inflation is less than anticipated.
Given the impact of inflation on the position of a debtor and creditor; it is confusing that
contracts in nominal terms are still widespread. One might expect some sort of indexation
to the changing price level. In the economies where inflation is high and volatile, indexation
is prevalent. Hence, loans are made available at floating interest rate than at a fixed
interest rate.
Hyperinflation
When inflation surpasses the benchmark of50 % per month i.e. approximately a little above
1% per day, it is termed as Hyperinflation. This high rate of inflation when amalgamated
13

over several months becomes a source of significant increases in the level of prices.
Therefore, it can be said that a 50% inflation rate per month would imply above 100 times
increase in the level of prices over a year and further 2 million times increase over 3 years.
The causes of hyperinflation
Excessive growth in the money supply causes hyperinflation. The price level immediately
rises when money is printed by the Central Bank. And hyperinflation results when it prints
money speedily. A condensing of the rate of money growth by the Central Bank can stop
hyperinflation.
Whenever the government faces budget deficit, it seeks to borrow but fails to do so as the
lenders consider the government as a bad credit risk. It then resorts to deficit financing to
cover up the budget deficit which consequences into speedy money growth and
hyperinflation.
Fiscal problems get rigorous with the advent of hyperinflation. Real tax revenue falls due to
delays in tax collection and consequently inflation rises. Thus, the government’s reliance on
seignior age is self- reinforcing. Fast money creation causes hyperinflation, which results
into higher budget deficits, and consequently more speedily money is created.
The government assembles the political will for the reduction in the spending of the
government and tax increase when the scale of the trouble becomes evident. These
suggested fiscal reforms spot to the reduction in the requirement for seignior age, which in
turn permits the reduction in money growth. It can be, therefore, said that if inflation is
forever a monetary phenomenon then the conclusion of hyperinflation is time and again a
fiscal phenomenon.
The costs of hyperinflation
It is a unanimously accepted fact that hyperinflation takes a high toll on the society. The
costs of extreme inflation are similar to that of hyperinflation. It’s just that due to the
severity of the costs of hyperinflation, they are more noticeable.
A great amount of time and energy is devoted by the business executives towards cash
management. They are forced to divert this time and energy from more socially important
14

activities such as production and investments decision when cash loses its value quickly i.e.
the economy runs less efficiently during hyperinflation. In the nutshell, we can say that the
shoe leather costs associated with reduced money holdings are very severe under
hyperinflation.
Menu costs become significant at the times of hyperinflation, as firms are required to
change the prices frequently. Regular business practices of printing and distribution of
catalogs with fixed prices become unfeasible. For instance, once in 1920’s in Germany, a
waiter in a restaurant had to call out new prices every half hour on every table at the time
of hyperinflation.
In a similar fashion, at the times of hyperinflation, relative prices also don’t reveal the
exactshortage. It gets very complex for the customers also to shop for the best price as
prices fluctuate significantly and recurrently. Consumer’s behavior also gets distorted in a
variety of ways due to extremely volatile and fast expanding prices.
Finally, one should learn to live with the hassle of life with hyperinflation. The existing
monetary system is not executing its best to facilitate exchange as it is equally troublesome
to carry money to the grocery store as it is to carrying the groceries back home. The ready
solution accomplished by the government is to add more and more ZEROS to the paper
currency but it has failed to keep pace with the out bursting price level.
In due course, the costs of hyperinflation become unendurable as the functions of money as
a store of value, medium of exchange and unit of account get defeated. Barter replaces
money as a common medium of exchange and more stable unofficial monies cigarettes
replace the official money.
Summary
 Inflation in an economy could be due to five reasons: increase in money supply,

excessive demand, rise in cost of production, structural rigidities and international
flow of goods.
 There are three indicators of inflation in an economy: Wholesale Price Index (WPI),
Consumer Price Index (CPI) and GDP Deflator.
 India has been experiencing inflation in recent years. In 2009, India witnessed
double digit inflation.
15

 The costs of expected inflation include shoe leather costs, menu costs, cost of tax
distortions, relative price variability and inconvenience of making inflation
corrections. In addition, unexpected inflation causes arbitrary redistributions of
wealth between debtor and creditor.
 Hyperinflations usually initiate when government resorts to deficit financing to cover
up its budget deficits. The severity of most of the costs of inflation enhances during
hyperinflation.
Glossary
 Inflation:Inflation is a persistent increase in the general level of prices.

 Demand pull inflation: When prices rise due to excessive demand in the economy
then it is said that inflation is demand pull inflation.
 Cost push inflation: when price rise is observed due to supply shocks or rise in prices
of inputs then inflation is said to be cost push inflation.
 CPI:Consumer Price Index represents the rate of increase in the consumer prices of a
basket of goods and services.
 WPI:Whole sale price Index represents the rate of increase in the wholesale prices of
products.
 Core inflation: Core inflation is a measure of inflation that excludes items that face
volatile price movement, notably food and energy.
 Fisher effect: the one-to-one adjustment of the nominal interest rate to the inflation
rate is known as Fisher effect.
 Shoe leather cost: The inconvenience of reducing money holding is metaphorically
called the shoe leather cost of inflation, because walking to the bank more often
causes one’s shoes to wear out more quickly.
 Hyperinflation: Hyperinflation is very rapid growth in the rate of inflation in which
money loses its value to a point where alternative mediums of exchange.
Exercises
Short answer questions
Q1. In a country experiencing a low rate of inflation it is quoted from a newspaper: “low
inflation has a downside: 45 million recipients of social security and other benefits will see
their checks go up by just 2.8 percent next year.”
16

a. Why does inflation lead to increase in social security and other benefits?
b. Is this effect cost of inflation? Why or why not?
Q2. Sate whether following statements are true or false. Why or why not?
a. Inflation is a monetary phenomenon.

b. Inflation leads to fall in purchasing power.
c. Inflation hurts borrowers and helps lenders.
Q3. If inflation rises from 6 to 8 percent what happens to real and nominal interest rates
according to the fisher effect?
Long answer questions
Q1. What is inflation and what are its causes? How is it measured?
Q2. List all the costs of inflation and rank them according to how important you think they
are.
Q3. How does inflation affect the ability of money to serve its functions- medium of
exchange, unit of account and store of value?
Numericals
Q1. If CPI in a country is 113 in year 2010-11 and its value changes to 133 in year 2011-
12. What can we say about inflation rate in the economy? If over the same period WPI’s
value decreased from 109 to 101. How would you explain such changes in the economy?
Q2. In a country, the velocity of money is constant. Real GDP grows by 5 percent per year,
the money stock grows by 14 percent per year and the nominal interest rate is 11 percent.
What is the real interest rate?
References
17


2007.

edition, 2011.
3.http://www.caribank.org/uploads/publicationsreports/staffpapers/Inflation%20starts%20i
n%20LACV2e%20manuscript.pdf
4. ManmohanAgarwal&DipankarSengupta, Structural adjustment in Latin America, Economic

& Political Weekly, Vol.34, No.44 (Oct. 30- Nov. 5, 1999), pp. 3129-3136
5. Denise Hazlett and Cynthia D. HillSource, Calculating the Candy Price Index: A Classroom
Inflation Experiment, The Journal of Economic Education, Vol. 34, No. 3 (Summer, 2003),
pp. 214-223
18

Mathematical Methods for Economics: Preliminaries-II
Semester-I
Paper II: Mathematical Methods for Economics: Preliminaries-I
Unit-I
Lesson: Preliminaries-II
Lesson Developer: Sanjeev Kumar
College/Department: Dyal Singh College, University of Delhi

1
“The master economist must possess a rare combination of gifts.

... He must be mathematician, historian, statesman, philosopher to some degree.”
J. M. Keynes
CONTENTS:
 2.0 Learning outcomes of the chapter
 2.1 Mathematical proof technique

 Introduction
 Direct and indirect proof
 Proof by mathematics induction
 Deductive vs. inductive reasoning
 Problem set
 2.2 Set , set operation and Venn diagrams

 Introduction
 Set and types of set
 Set operation
 Venn diagrams
 Law of relating of set theory
 Problem set and answers
 2.3 Relations and their properties

 Relation
 Directed graphs
 Inverse Relation
 Basic characteristics of binary relation
 Combining relation
 Composition
 Problem set
 2.4 Functions and its types

 Function
 Types of Functions
 Properties of functions
 References
 2.0 LEARNING OUTCOMES OF THE CHAPTER

2
After completion of the present chapter, you should be able to;
 Understand mathematical proof technique, i.e., direct and indirect proof

 Understand set, set operation, Venn diagrams and their implication
 Describe the uses of set theory
 Explain the term of relations, types and their properties
 Understand the term of function, types, operation and their properties
 Determine whether a set of numbers or a graph is a function
 Describe the symmetry of a graph as odd or even function
2.1 MATHEMATICAL PROOF
 Introduction
According to R. Dedekind, "in science, what can be proved should not believed
without proof". Theorems are the most important outcome of the every branch of
mathematics. Proof of these theorems is the heart of mathematics and it distinguishes
mathematics from the other disciplines. In simple way, we can say, "a proof is a chain of
reasoning that establishes the truth of particular statement or a proposition. For example
Pythagoras theorem is an important proven result in this direction.
Example: Prove that the sum and product of any three consecutive even numbers is
always a multiply of 6 and a multiply of 8 respectively.
Proof: These three consecutive numbers must be multiply of 2, so, we can write
these number; 2N, 2N + 2 and 2N + 4. Where N is an whole number.
Let we take first case; the sum of three consecutive numbers are;
2N + (2N +6) + (2N + 4)
= 6N (N + 1)
= 6 (N + 1)
Which is six times of (N+1). Hence proved

In second case; the products of three consecutive numbers are;
2N  (2N +2)  (2N + 4)
= 2N  2(2N +1)  2(N + 2)
= 8N(N +1)  (N + 2)
Which is eight times of N (N + 1) (N + 2). Hence proved.
 Direct and Indirect Proof

3
Direct Proof:
A direct proof is a mathematical argument that uses rules of inference to derive the
conclusion from the promises.
Example: The sum of two even numbers is even
Proof: Le x and y are two even numbers and there exist m and n are integers, such
that
x = 2m and y = 2n
Then, the sum, x+y = 2m + 2n
= 2(m + n)
This is even number. Hence proved.
Indirect Proof:
An indirect proof is a mathematical argument that uses rules of inference to derive

the negation of the conclusion. Sometime, it is more convenient to prove implication by an
indirect proof.
P  Q is equivalent to not Q  not P
Contradiction: It is the third method of proof sometimes. It is useful and based on

fundamental logical principle.
Example: By using above three methods to prove that
-x2 + 4x -3 > 0  x > 0
Solution:(i) Direct Proof:
Given; -x2 + 4x -3 > 0
Adding both side x2 + 3 in the inequalities
Then -x2 + 4x -3 + x2 + 3> x2 + 3
4x >3 x2 + 3
And, we know, x2 + 3 3 for all x
So, we have 4x> 3
Then, x > 3/4, in particular, x > 0
x> 3/4  x > 0
(ii) Indirect Proof; let x  o, then 4x  0

4
So, -x2 + 4x -3 is the sum of three non positive terms, then, it is  0
(iii) Contradiction Proof: suppose the statement is not true then there exist an
x such that;-x2 + 4x -3>0 and x>0
But if x  0, then, -x2 + 4x -3  -x2 -3  -3
Now, we have arrived at a contradiction
 Proof by Mathematical Induction
Mathematical induction is not based on inductive reasoning but it is a single base

case. In this method, we assume that if the base case is true then the infinity of other case
must also be true. We can understand of mathematical induction by taking an example of
natural numbers.
Let N =  1, 2, 3, -------  be the set of natural numbers, and P (n) is the

mathematical statement about natural numbers, then
 P(1) is true i.e. P(n) is true for n = 1
 P(n + 1) is true, whenever P(n) is true i.e. P(n) is true implies that P (n + 1) is also
true, so P(n) is true for all natural numbers
This process is called mathematical induction.
Theorem: Prove that the sum of the first n odd natural numbers is n 2
i.e. 1+3+5+7+9+-----------------+(2n-1)= n2
Proof: By induction methods, it is true for n=1
Now we take, n = k (positive integer);
Then, 1+3+5+7+9+-----------------+ (2k-1) = k2
The next odd number to be added both sides in above equation, we get,
1+3+5+7+9+-----------------+ (2k-1) + {2(k+1)-1} = k2+{2(k+1)-1}
Then, we conclude;
1+3+5+7+9+-----------------+ (2k-1) + {2(k+1)-1} = (k+1)2
So, we can say it is also true for k+1. Then, it holds for all natural numbers n.
 Deductive Vs Inductive Reasoning
The above three methods of proof are the outline of the deductive reasoning.
Basically, deductive reasoning is based on consistent rules of logic and proof is the
important part of it. The second type reasoning is called inductive reasoning which is used in
many branches of science and social science. In this reasoning; the process to draw
5
conclusion is based on few observations. For example, If the price level has increased from
the last 20 years then price will also increase in next coming year. The above example
demonstrates inductive reasoning. In fact, it is no guarantee that price level will increase in
the coming year. So, inductive reasoning is not recognized as a form of proof in
mathematics.
Example: By using backward (deductive) reasoning; let x, y > 0 with x  y,
x y
Then,  xy
2
Proof: Conclusion is true if x  y  2 xy
( x+ y) >4xy
2
Conclusion is true if
Conclusion is true if x 2 + y 2 -2xy >0
Conclusion is true if x  y   0 , which is true.

2
Problem Set
1. If x2 is odd number, then x is -----------------.
2. Use direct, indirect and contradiction methods to prove that;
x2 + 3x -2 > 0 x>0
3. Use mathematical induction to prove that n<2n for all natural numbers n.
4. Prove that, the sum of square of three consecutive numbers and then subtracts two
is always a multiple of 3.
5. By using mathematical induction, prove that;
1+2+3+4+………………………………………………+n=1/2{n (n+1)}
2.3 SET THEORY
 Introduction

6
Set theory is a part of discrete mathematics. In this theory, we represent discrete

objects by sets. A well-defined collection of objects is called set. These objects are called
elements or member of the set. It is denoted by ‘S’ or capital letters.
For examples: S= {1, 2, 3,……….N}
Here S is a set of natural numbers and 1, 2, 3……………N are members of Set
S = {a, b, c, d}
The set of first four alphabet in English languages

V= {a, e, I, o, u}
That is the set of vowels in English alphabet.
 Types of Sets
Finite Set: A set having finite number of elements is called finite set.
For example: S = {x, y, z}, there are only three element
Infinite Set: A set having infinite number of element is called Infinite Set.
For Example: S = {1,2,3……………………….}
Null (empty) Set: If there is no element in set then it is called Null Set. It is
denoted by  (phi)
For Example: S = { }
Unit Set: It include only one member
The Universal Set: The set of all objects is called universal set and it is denoted U.
Equal Set: If the both set have same element then they called equal set.
For example: A = {1, 2, 3} and B = {3, 2, 1}, Then, A = B, A & B are equal set.
Subset: The set A is said to be a subject of B. If every element of A is also an

element of set B. It is denoted as A  B and read as; A is subset of B.

7
Proper Subset: Set A is said to be proper subset of set B. If A  B and A ≠ B
Cardinality: The cardinality of set S is denoted by S.
Let S = {1, 2, 3, 4, 5}
Then, S = 5 (No. of elements in set)
Proper Set: The proper set is the set of all subsets. It is denoted by P(S)
Example 1: Given S = {1}
Then P(S) = {  , {1}} and P( s )  2
Example 2: S= {a, b, c}
P(S)= {{  }, {a},{b},{c},{a, b},{b, c},{c, a},{a, b, c}}
= 2n, Here; n=3 and it represent power of the set.
=8
 Venn Diagrams
It is the diagrammatical representation of set theory. It is easy way to
understand set theory.
Complement of set: It is definite as;
A-1 = {x: x U but x ∉ A}
Disjoint Set: It is defined as;
AB = 
Cartesian product: It is defined as;
AB = {(x, y): xA and yB}

8
is the symbols of Cartesian product.
Example: Given; A = {1, 2}, B = {a, b}, find AB
Solution: AB= {(1, a), (a, b), (2 ,a), (2, b)}
 Set Operations
AB (A Union B): AB is the set of those elements, which are, belongs to in set A or in
set B or in both sets;
 AB = { x x  A or xB}
Given, A = {1, 2, 3}, B= {2, 3, 4}
Then, AB = {1, 2, 3, 4}
AB (A intersection B): A common element between set A and set B are called AB.
AB = { x x  A and x  B}
Given, A = {1, 2, 3}, B = {2, 3, 4}
AB = {2, 3}
 Laws Relating to Set Theory

 Commutative Law
AB = BA and AB = BA
 Associative Law
(AB) C = A(BC) and (AB)C = A(BC)
 Distribution Law
A (BC) = (AB) (AC)
A (BC) = (AB) (AC)

9
 De Morgan’s Law
(AB)C = ACBC
Or, (AB)C = AC  BC
Theorem: If the number of the element of sets A and B is finite, then
 n (AB) = n(A) + n (B) –n (AB)

 n (ABC) = n (A) + n(B) +n(C) – n (AB) – n (BC) – n(AC) + n (ABC)
Example 1: Write true (T) or False (F) for the followings;
(i) A ’  B = A  B’ (F)
(ii) A  B  A B =B (T)
(iii) XY = XZ Y=Z (F)
(iv) A (BC) = (AB)  (AC) (T)
Example 2: Given; X= {1, 2, 3, 4, 5} Y= {4, 5, 6, 7} and Z= {2, 3, 6} then find;
(i) X (YZ) (ii) X (YZ)
Solutions: (i) YZ = {2, 3, 4, 5, 6, 7}
 X(YZ) = {2, 3, 4, 5}
(ii) YZ = {6}

X(YZ) = {1, 2, 3, 4, 5, 6}
Example 3: In a survey of reading habits of 100 students. It was found that 50 students
used the university library, 40 students had their own library, 30 students borrowed from
friends. It was also found 20 students used both the library i.e. university & own library, 50
students used their own library as well as borrowed from friends; while 10 students used
the university library and also borrowed from friends. How many students used all the
three sources of books?
Solution: Let A, B, and C the sources of books, i.e., library, own library and borrowed
from friends
10
By using theorem of set theory,
n (ABC) = n(A) +n(B)+ n(C)- n (AB) – n(AC)- n (BC +n(ABC)
Given; n(ABC) =100, n(A) = 50, n(B)= 40, n(C)= 30, n (AB) = 20,
n(BC) = 50 and n(CC) = 10
n(ABC) = n(ABC) - n(A) - n(B)- n(C)+ n (AB) + n(AC) +n (BC)
= 100 – 50-40-30+20+50+10
= 60 students
Example 4: A consumer’s consumption set is given by
C = {(x1, x2): x1 > 0, x2 >0}
And budget set is also given by
B = {(x1, x2): p1x1 + p2 x2  M}
Where x1 and x2 are quantities of goods and their respective prices are p1and p2
>0 and M>0 is income of the consumer. Illustrate in a diagram the sets;
(a) BC (b) BC
Are they convex? Bounded? How can you interpret it?
Solution: Let set C represented by rectangular diagram.
And set B represented by a triangular diagram.

.
Now, BC = C
And, BC= B, because B is a subset of C.
Yes, both are convex set and BC is bounded
And Set BC is unbounded.

11
Example 5: Show graphically the region represented by the following set;
A = {(x, y): x>0 for all y and y>0 for all x.}
Is the given set Convex?
Solution: Yes, the given,
A = {(x, y): x>0 for all y and y>0 for all x.}
Set is convex to the origin.
PROBLEM SET
(1) Given, A= {1, 2, 3}, B = {2, 5, 6}, C= {5, 6, 3}
Find (i) AB (ii) A (BC) (iii) A – (BC)
Answer:
(i) AB = {1, 2, 3, 5, 6}

(ii) A (BC) = {1, 2, 3, 5, 6}
(iii) A – (BC) = {1}
(iv) ABC = {1, 2, 3, 5, 6}
(2) Draw Venn diagram for the justify of given formulae’s

(i) AB (ii) A= A (iii) A(BC)
Ans: (i) (ii) (iii)
(3) If x and y are the finite sets, then prove the followings:
(I) n(xy) = n (x) + n (y) – n (xy)
(II) n (x/y) = n (x) – n (xy)

12
(4) In a class of 100 students, 60 students take economics, 50 students take

mathematics and 20 students take both. Find the number of student taking neither
the two subjects.
Answer: 10 students
(5) Given A = {1, 2, 3} and B= {, , }, then
Find Cartesian product of the given set.
Answer: A B = {(1, ), (1, ), (1, ), (2, ),(2, ),(2, ),(3, ),(3, ),(3, )}
(6) Suppose A= {a, b, c} B= {a, b, c, d} and C = {a, b, c, d, e}, then prove that A B
and BC implies AC
(7) Asked, if you will vote for ‘x’ party the following responses are recorded
Yes No Don’t Know
Male: 20 40 10
Female: 40 15 15
Youth (Just as 18 years) 20 10 10
Where A = Set of adult male S  Set of Yes answer

C = Set of adult women N  Set of No answer
Y = Set of youth
Find; (i) n(A) (ii) n(AS) (iii) n (YN)’ (iv) n[A (YN)]
Answer: (i) 20 (ii) 20 (iii) 30 (iv) 20
(8) (i) Given, A= {(x, y): x-y0}, B= {(x, y): |x| y0}, C= {(x, y): x y1}
Show that ABC and ABC are closed and convex.
(ii) Are the following sets convex/bounded/closed?
(a)Set A= {(x, y): y IxI} (b) Set B = {(x, y): y 1/IxI} (c) Set AB

13
2.3 RELATIONS AND ITS PROPERTIES
Relation: A relation can be defined on set A and set B. Basically; it is a subset of

Cartesian product of sets i.e. A×B. It is given by R  A× B
If (a, b) R, It means that ‘a’ is related to ‘b’ by R.
Here, Set A  Domain of R
Set B  Co-domain or Range of R
If the both set are equal i.e., A=B then R is called a binary relations on the set A
Notation:
 aRb it means (a, b)  R

 a b it means (a, b)  R
Ř ,
Example:
Suppose A = {, β, } and B = {2, 3, 4, 5}
Let, there are three relations i.e.R1,R2and R3is given below;
R1= {(, 2), (, 3), (, 5)}
R2 = {(β, 2), (β, 3), (, 4)}
R3= {(, 2), (β, 3), (, 4)}
The above relation can be discussed by graph,
 Inverse Relation: Let R be a relation from A to B. Then inverse relation R-1 can
be defined as:
R-1 = {(b, a): (a, b)R}, Here R-1 is a relation from B to A.
It is called the inverse of the relation R.

14
Example: If A {a, b, c} and B= {1, 2, 3} then find R-1?
Solution: R = A× B
= {(a,1),(a,2),(a,3),(b,1),(b,2),(b,3),(c,1),(c,2),(c,3)}
Then, R-1 = {(1,a),(2,a),(3,a),(1,b),(2,b),(3,b),(1,c),(2,c),(3,c)}
 Important Properties of Binary Relations

 R is reflexive if aRa
i.e.  a {(aA)  ((a, a)  R)}
 R is symmetric if aRb = bRa
i.e.  a  b [{(a,b)  R} and {(b,a)  R}]
 R is transitive
If aRb and bRc then aRc
i.e.  a,b,c {(a,b)  R and (b,c) R  (a,c) R}
 Identity Relation: (x,y) xRy x=y, Here R is the Identity relation.
Example 1:Suppose A {, β, } and R is the relation {(, )}.
 The above relation aRa is called reflexive.
Example 2: Given,A = {1, 2, 3}

B = {1, 2, 3} i.e. A=B
Then find xRy, if(i) R is given by y<x (ii) y=x (iii) x=2y
Solutions: The Cartesian product is given by;

A  B = {(1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2),(3,3)}
(i) If y<x
 R = {(x,y): xA and yA, y<x}

= {(2, 1), (3, 1), (3, 2)}?
Here, domain is {2, 3} and range is {1, 2}
(ii) If, y = x
 R = {x: yA, y = x}
R {(1, 1), (2, 2), (3, 3)}
(ii) If x= 2y
 R = {x, yA, x = 2y}
R= {2, 1}
Here domain and range are {2} and {1} respectively.
Example 3: Let A is set of real numbers and the relation R defined as:
R= {(x, y): x>y>0, (x, y)A}
Then represent it graphically.

15
Solution:
Example 4: If R1 and R2 are the transitive relation on a set A then R1R2 is transitive?
Solve by taking an example.
Solution: Suppose A = {2, 3}
Then R1 = {(2, 3)}, R2={(3,2)}
Therefore R1 R2= {(2, 3), (3, 2)}
Here, we note that R1 and R2 are both transitive but R1 R2 is not transitive.
Composition: Suppose R1 is the relation from A to B and R2 is the relation from B to A
Then the composition of R1 with R2 is defined as;
 (x, z)  R2oR1 yB Such that (x, y)  R1 and (y, z)  R2

Here R2oR1 is the symbol of composition.
 Let R is a binary relation on A, then;
Basis: R1 = R
Recurrence: Rn+1 = RnoR, if n1
Examples 5: Suppose A= {, β, }, B= {1,2,3,4} and C = {a, b, c, d}, find R2oR1
Solution: Let R1 = {(, 4), (β, 1)}
R2 = {(1, b), (1, d), (2, a)}
Then; R2oR1 = {(β, b), (β, d)}
PROBLEM SET
1. Let xQy is a relation based on the set of integers, given that 2x-y = 1, then prove
that the relation is not reflexive.
2. Let xRy is the relation of set of real numbers such that x/y=2. Then describe the
relation R2. Is the relation reflexive?
3. If the relation R from A to B is given by
R = {(x, y): x, y A×B, x = 2y+1}, graph the relation.
4. If x and y are the set of all real numbers then explain why the statement
y= |x| - 1 and y = x2 -1 give the same relation?
{Hint: x is the absolute value and it taken always positive}

16
Functions and its types
Functions: A function from a set A to a set B is rule that assigns a unique element in B to
each element A.
Generally, function is a special case of relation. Where for each x, there is

only one corresponding y. So, we can say, a relation
(i) is also a function
(ii) is not a function
Above both conditions are given in below diagram;
The functional relationship between variables x and y is called function. If y is

function of x then it is represented by
xR y or yRx or y = f(x)
Where x  dependent variable or domain of the function
Y  independent variable or range of the function
Example: Let A= {1, 2, 3, 4, 5} and B = {-1,-2, 0, 1, 2}

xA yB
And the condition is given (i) y = x2, (ii) y=x
Do these relations define functions?
Solution:
(i) The subset of A B which satisfy the condition y=x2 is
x 1 Here; Range = 1

Y = x2 1 Domain = 1

17
So, this relation defines function.
(ii) The subset of AB which satisfy the condition y=x is
x 1 2 Here; Range = 1, 2

Y = x2 1 2 Domain = 1, 2
So, this relation defines function.
 Types of function
One to one function:
A function is said to be one to one function, if

f (a) = f (b) then a=b  a & b its domain
One to Many functions:

A function is said to be one to many function. If set A has more than one
image in set B.
Increasing and decreasing function:
These functions can be defined as;
 Increasing function: If a<b then f (a) f(b)

 Strictly increasing function : If a < b then f (a) < f (b)
 Decreasing function: If a < b then f (a)  f (b)
 Strictly decreasing function: If a<b then f (a) > f (b)
Rational function: It is the ratio of two polynomial functions;
The rational function is given by;
1 1
f ( x)  & g ( x)  2
x x

18
Absolute value function: It is defined as;

y = f (x) = |x|
Here, Domain (D) = [0,)
Range ® = [0,)
Even function: It is the symmetry function;

Suppose, y = f (x) = x2
or, f (-x) = f (x)
Odd function: It is defined as;
Let y = f (x) = x3
Or f (-x) = -f (x)
Exponential Function: It is defined as; f(x) = ax and a > 0

19
Logarithmic Function: It is defined as;
y = f(x) = logx
 Some properties of Function

 If the function is continuous then we draw the graph of function.
 Function must be defined on real number
 Function must be based on relation
 Function must also satisfy the following conditions;
 There is no negative number inside a square root.
 There is no zero in denominator
 There is no zero and no negative inside a logarithm function.
Example: In the rule that assign to each of the 50 students in a class his marks out of a
maximum of 100 marks a function? If yes, is the function one to one?
Solution: Yes,
This function is also one to one function because every student gets unique
marks out of 100 marks.
Note: The detailed discussion about the functions is given in next chapter.

20
PROBLEM SET
1. Which of the following relations are functional relations? Explain

(i) Y is son of x
(ii) Y = (x-4)*3
(iii) C= f(y), consumption is proportional to Income
2. Prove that if y is a function of x and x is a function of z, then y is the function of z.
3. If X= {2, 3} and y ={2, 3, 4, 5, 6} and condition is given

(i) Y= x (ii) Y=x2 Do these relations define function?
4. Graph the following function “f”.

(i) A = {a, b, c, d}
B = {1, 2, 3}
f = {(a, 2), (b, 1), (c, 2), (d, 3)}
(ii) A = {a, b, c, d}
B = {1, 2}
f = {(a, 1), (b, 2), (c, 2), (d, 2)}
ANSWERS
1. (i) Yes (ii) Yes (iii) Yes
3. (i) Yes (ii) Yes
REFRENCES
 Allen, R.G.D, Mathematical Analysis for Economists, London: Macmillan and Co. Ltd
 Chiang, Alpha C., Fundamental Methods of Mathematical Economics, New York: McGraw Hill
 Carl P. Simon and Lawrence Blume, Mathematics for Economists, London: W .W. Norton & Co.
 Knut Sydsaeter andPeter J. Hammond,Mathematics for Economic Analysis, Prentice Hall
 Michael Hoy, John Livernois, Chris Mckenna, Ray Rees, Thantsis Stengos, Mathematics for
Economists, Addison-Wesley Publishers Ltd.

21
Functions, Sequence and Series
Semester-I
Unit-II
Lesson: Functions, Sequence and Series
Lesson Developer: S. K. Taneja
College/Department: Ramlal Anand College (Eve.) , University of
Delhi
1
Contents:
1. Learning Outcome
2. Graphs and Functions
2.1 Linear functions
2.2 Pointpoint formula
2.3 Quadratic Function
2.4 Polynomial Functions
2.5 Rational Functionas
2.6 Graphing Rational functions
3. Sequence
3.1 Bounded sequence
3.2 Finite sequence and Infinite sequence
3.3 Limit of a sequence
3.4 Convergent sequence
3.5 Divergent Sequence
3.6 Oscillatory Sequence
4. Series
4.1 Convergence and Divergence of Series
4.2 Arithmetic Series
4.3 Geometric Series
5. Exercises
6. References
1. Learning Outcome
After reading this lesson you will be able to know the
various types of functions i.e. linear, quadratic, polynomial,
rational functions and their graphs. Besides sequence and
series will also are covered in this lesson. Various types of
sequences i.e. Bounded sequence, Finite sequence and
Infinite sequence, Limit of a sequence, Convergent
2
sequence, Divergent Sequence, Oscillatory Sequence are

discussed in detail. Similarly different types of series i.e.
Arithmetic Series, Geometric Series are also explained
below.
2. Graphs and Functions
Cartesian coordinate system is composed of a horizontal line and a

vertical line perpendicular to each other. These lines are called coordinate
axis. The point where they intersect each other in called the origin (0).
Horizontal axis or xaxis (abscissa) gives the distance of a point from
vertical axis or yaxis (ordinate) give the distance of a point from horizontal
axis. To the right of yaxis x coordinates are negative. Above the xaxis y
coordinates are positive and below xaxis they are negative.
Figure 1
The sign of the coordinate in each quadrant are shown in the figure.
Quadrants are numbered anticlockwise.
Each point in the coordinate system is associated with ordered pair of

numbers known as coordinates, showing the location of point in relation to
origin. The point (2, 1) is 2 units right of y-axis and 1 unit above x-axis.
There are several functions which are utilized in economics and some
of them are:
Linear functions f(x) = mx + c
3
Quadratic function f(x) = ax2 + bx + c (a  0)
Polynomial function :
f  x   an x n  an 1 x n 1  an  2 x n  2  .......  a0 n is a nonnegative integers

and an  0
Rational functions :
g  x
f  x 
h  x
where g(x) and h(x) are polynomials and h(x)  0.
Power function :
f(x) = axn (where x is any real number)
The domain of linear, quadratic and polynomial functions is the set of

all real numbers; the domain of rational and power functions any value of x
for which the function is not defined. We shall now discuss these functions
use by one.
2.1 Linear functions:
Given any equation in x and y, we can depict the set of points in the
coordinate system; which satisfy this equation. This set of points is called
the graph of the equation. The graph of a linear equation is a straight line.
The general equation of a straight line is
Ax + By + C = 0
This can be written as
A C
y  x 
B B
Where the slope of the line is  A and the yintercept is  C .

B B
The equation has been written in the slope intercept form. We can write it as
y = mx + c
4
Where slope = m and intercept = c
In the figure we have drawn two

straight lines. In this case use line the
slope is m1 and the intercept is b. In
the case of other line the slope is m2
and the intercept is c.
Figure 2
The slope of a straight line conveys the steepness and direction of the
line.
See the figures given below.
Figure 3(a) Figure 3(b)
In the figure (a) the negative sign convey that the straight line is
negatively sloping. The magnitude of the slope (m) conveys the steepness of
line.
The graph of a linear function can be drawn easily by different values

of x in the equation and finding out the values of y. Now we have ordered
pairs of x and y (x, y) which can be plotted in the figure.
5
Another way of plotting the graph is if we have the y intercept and

one point (x, y). Plotting this two in the figure, we can find the straight line
by joining this two points by a straight line passing through then points.
2.2 Pointpoint formula:
If two points on a straight line are given. Then we can find the slope
and equation of straight line given two points (x,y) and (x2, y2)
y1  y2
slope  m 
x1  x2
Given two points satisfying the equation. We can write
y1  mx1  c ........ (1)
y2  mx2  c ........ (2)
Subtracting equation (2) from (1) we get
y1  y2  m  x1  x2 
y1  y2
m 
x1  x2
Point slope formula for straight line is
y  y1  m  x  x1 
For any point (x, y) to be on a straight line passing through the point
(x1 , y1) and has a slope m, it must be true that
y  y1
m 
x  x1
Rearranging this
y  y1  m  x  x1 
This is the equation of straight line and can be written as
6
y  m  x  x1   y1
 mx  y1  mx1
or  mx  c
when c  y1  mx1  constant
Line parallel to y = mx + c
Find the equation of a line parallel to
y  mx  c , which is y = 3x + 2 passing through point (2, 5)
Solution: Parallel lines have equal slope so the slope of the line m
equation is given by;
y  y1  m  x  x1 
Suppose m = 3 then
y  5  3 x  2
y  3x  6  5  3 x s  1
Linear passing through point (8, 3) and perpendicular t0 another line

y = 2x + 10. Perpendicular line have slope that are negative reciprocal of
each other.
So m = -1/2
Equation is y  3  1 x  8  y  1 x  7
2 2
2.3 Quadratic Function:
An equation of the form ax2 + bx + c = 0 where a, b and c are

constants and a  0 is called a quadratic equation .
This is an example of nonlinear equation. The graph of the function

will not look like a straight line. It will be nonlinear graph.
We take an example
7
y = x2
In this function a = 1, b = 0 and c = 0.
T0 graph this function, simple pick some representative values of x; solve for
f(x) which is usually referred t0 as y in graphing. Plot the resulting ordered
paris [x, f(x)] and connect them with a smooth line. The procedure is shown
below for
y = x2
x F(x) = y [x, f(x)]
3 9 (3, 9)
2 4 (2, 4)
1 1 (1, 1)
2 4 (2, 4)
3 9 (3, 9)
Figure 4
The graph of a quadratic function ax2 + bx + c = 0 where a  0, is a

parabola. In the figure where a = 1, b = 0 and c = 0 is graphed. The vertex
of parabola is (0, 0). In the case of general equation of parabola then vertex
 b b2 
is  , c  
 2a 4a 
The quadratic function can be solved by factoring, completing square,

or using the quadratic formula.
The quadratic formula is
b  b2  4ac
x
2a
An equation of the from
8
x2 + bx = 0 where a = 1 and c= 0
can be converted into a perfect square by taking one half of the coefficient of
 2  2
2
x b , squaring it b and adding to the original expression to obtain
b2
 
2
x 2  bx   xb
4 2
Example:
x2 + 12x + 35 = 0
i). Move the constant to right hand side
x2 + 12x = 35
ii). Take the half of the coefficient of x (12) which is 12  6

2
square it 62 = 36
iii). Add 62 to both sides
x2 + 12x + 62 = 35 + 62
(x + 6)2 = 1
Take the square root of both sides and then solve for x
x  6  1  1
x  7 and 5
For an expression in the form
ax2 + bx + c
Write
 b c
a  x2  x    0
 a a
2
b b  b 
Now take half of b which is  . Take the square of    . Add and
a 2a 2a  2a 
subtract it in the bracket expression
9
 b  b 
2
c  b  
2
a  x2  x        
 a  2a  a  2a  
 
  4ac  b  0
2
 a  x  b
2
 2a  4a
  b  4ac  0
2
or a  x  b
2
 2a  4a
The vertex of this parabola is
 b e  b 2 
 , 
 2a 4a 
From the above exercise; a quadratic function can be expressed in this form:
y = a(x  h)2 + k
where the axis is (xh) = 0, x = h and the vertex is (h, k). The expression h
shifts the function by h units from the origin. The function will shift to the
right or left will depend on the sign of h for example if
For Example if :
y = (x3)2 + 16
Then the function will be shifting to the right of y axis.
The term K shifts the function up or lower it depending upon the sign of k. In
our example.
y = (x3)2 + 16
The graph has been shifted 3 units to the left of origin and 16 units above
the xaxis.
If a > 0, the parabola opens up and the vertex is the lowest point of the
function.
If a < 0, the parabola opens down and the vertex is the highest point.
If |a| > 1 the parabola is narrower than if |a| = 1 If 0 < |a| < 1 it is wider
then if |a| = 1
10
Figure 5
2.4 Polynomial Functions:
The function
f  x   an xn  a n1 x n1          a1 x  a0  0
(where; a's are constant ; an  0)
is call the polynomial of degree n. Linear quadratic and cubic function are
also examples of polynomial.
The polynomial equation has at the most n real solution or roots, but it need
not have any.
Cubic function or polynomials of degree greater than or equal to 3 are

complicated because the shape of the graph changes substantially as the
coefficient a1, a2, a3 and a0 change.
Zeros of polynomial Equation
If r is a root of equation
f  x   0 i.e. if f  r   0
then (x  r) is a factor of f(x) conversely if (xr) is a factor of f(x) the r is a

root of f(x)= 0
or f(r) = 0
11
If b , a rational function in it lowest terms, is a root of the equation .

c
an xn  a n1 xn1          a1 x  a0  0 with intergral coefficient, then b

is a function a0 and c is a function of an.
Example :
3x3 + 5x2  3x  2 = 0
value of b are limited to factors of 2. Which are 1, 2 and vector of c are
limited to factions of 3 which are 1, 3. Hence the only possible real roots
are 1, 2 and  1 ,  2 .
3 3
Integral root theorem
It follows that if an equation f(x)= 0 has integral coefficient and the lead
coefficient is 1 (i.e. an = 1)
xn  a n1 xn1  a n2 x n2          a1x  a0  0
then any rational root in an integer and a function a0.
Example :
3x3 + 5x2  3x  2 = 0
possible roots are 1 and 2.
Whether there possible roots are actually the root of the equation or not can
be found out by putting these values in the equation in place of x and then
finding out whether f(r) = 0 or not. If the equation is satisfied then this a
root of the equation.
Another method of finding out whether r is a root of the equation or

not is synthetic division. Synthetic division is a simplified method of dividing
the polynomial f(x) by (xr). Where r any assigned number.
Example :
x3 + 2x2  23x  60 = 0
find functions of 60 1, 2, 3, 4, 5 and so on let us try 5
12
So we want to divide x3 + 2x2  23x  60 = 0 by (x  5) using the synthetic

division steps
1.) Write the terms of dividend in descending power of the variable and fill
in missing terms using zero for the coefficient (In our example there is no
missing term)
x3 + 2x2  23x  60 ÷ (x  5)
Write the constant terms a from the divisor on the left of a  and write the
coefficient from the divided to the write of the symbol.
5 1 + 2 23  60
Bring down the first term in the divisor to the third row for now
5 1  2  23  60

Multiply the term in the quotial row (third row) by the divisor and write the
product between the second row below the second term in the first row, add
the numbers in the column formed and write the sum as the second term in
the quotient row
5 1  2  23  60

5
1 7
Multiply the last term in the quotient row by the divisor under the term in
the top row, add the sum and write the sum in the quotient row. Continue
this process until all of the terms in the top row have a number under them.
5 1  2  23  60

 5  35  60
1  7  12  0
The third row is the quotient row with the last terms being the remainder.
The degree of the quotient polynomial is one less than the degree of the
dividend because we have divided by a linear factor. The term in the
quotient row are the coefficients of the quotient polynomials. The degree of
the polynomial is 2
13
x3 + 2x2  23x  60 ÷ (x  5)
0
= x2 + 7x + 12 +
x5
or x2 + 7x + 12
The existence of zero remainder proves that 5 is the root of the equation.
2.5 Rational Functions:
g  x
f  x  h  x  0
h  x
where g(x) and h(x) are polynomials
The graph of the rational functions can be drawn by taking

representative values of x and finding f(x) or y. Then the ordered paris can
be plotted.
Drawing the graph of rational function is made easier by funding the

asymptote.
Vertical asymptote : The horizontal asymptote is the line x = k where k

is found after all cancellation of common factor in the numerator and
denominator are completed, and then solving the denominator by setting it
equal to zero.
Example :
2x  3
f  x 
x2  4
x2 4 = 0 x = 2 and 2
are the vertical asymptotes
Horizontal asymptotes : The vertical asymptote is line y = h where h is

found by comparing the degree g(x) and h(x).
i) If the degree of f(x) is less than the degree of h(x) then the rational
function has a horizontal asymptote of y = 0
14
ii) If the degree of g(x) is equal to the degree of h(x) than f(x) has a
a
horizontal asymptote of y  n where an is the coefficient of the
bn
highest degree term of g(x) (Numerator) and bn is the coefficient of
the highest degree term of the h(x) (the denominator)
iii) If the degree of g(x) is greater than the degree of h(x), then f(x) does
not have a horizontal asymptote.
The graph of f(x) may cross the horizontal asymptote in the interior of
its domain. This is due to the fact that we are concerned with how f(x)
behave as x   or x   in determining the asymptote.
2.6 Graphing Rational functions:
g  x
i) f  x  we first determine the holes: Values of x for which
h  x
both g(x) and h(x) are zero. After any holes are located, we reduce
f(x) to lowest terms.
ii) Once f(x) is in lowest terms we find the asymptote, symmetry, zeros
and y intercept if they exist.
iii) Graph the asymptotes as dashed lines.
iv) Plot the zeros and y intercept and plot other points to determine how
the graph approaches the asymptotes.
v) Sketch the graph through plotted points and approaching the

asymptotes.
Example :
x3  2 x 2  3x
y
x
i) The graph has a hole at x = 0
0
y .
0
15
ii) Reduce it lowest term
x3  2 x 2  3x
 x2  2x  3 .
x
when x  0
There is a hole at (0, 3)
iii) There is no asymptote for the graph.
There is no y intercept but there are zero at (3, 0) and (1, 0). We plot
the zero and place an open circle around the point (0, 3) to indicate the
hole in the graph. Now select the corresponding points and plot them
Figure 6
Example :
5
x2
i) Set the denominators equal to zero
x2=0
x = 2 is the vertical asymptote
ii) Since the degree of g(x) less then degree of h(x) the horizontal
asymptote is
y = 0.
16
iii) when x = 0 y = 5
2
There are no holes nor there are zero (the graph does not cross the xaxis)
Plotting different point satisfying the

equation the graph would look like as
shown in the figure 7.
Figure 7
3. Sequence:
A sequence is a function whose domain is the subset (or set of)
Natural numbers N.
For example
1
f  x 
n
17
1 1 1 1
1, , , ................        
2 3 4 k
The dots indicate the sequence.
The numbers in the list are called the terms of the sequence.
Usually we write terms of the sequence as a1, a2, a3, a4 and so on
1 1
a1  1 a2  a3 
2 3
The nth term of the sequence
1
an 
n
We write the sequence by placing braces around the formula for nth term.
1
f  x  n N
n
1 
an   
n
Example :
1 1 1 1
(1) an  1, , , , ..............
 n 1
2 4 8
an   1
n 1
(2) 1,  1,1,  1, ................
1 1 1 1
(3) 1, , , an 
3 5 7 2n  1
2 4 6 8
Sequence is , , , ,.......
4 5 6 7
2n
an 
n3
Given the nth term of sequence one can find out different terms of
sequence.
Sequence for which
18
a1 > a2
a2 > a3
an > an+1
For any nN is said to be a decreasing sequence.
A sequence for which
a1 < a2
a2 < a3
an < an+1
For any nN is said to be increasing sequence.
3.1 Bounded sequence:
A sequence a n is bounded above, if and only if there exists a number

M such that.
an  M  n N
For all nN
The number M is called the upper bound of the sequence.
The sequence a k is bounded below, if and only if there exists a number m

such that
an  m  n N for all n  N
A sequence is said to be bounded if it is bounded above and below.
3.2 Finite sequence and Infinite sequence:
A finite sequence has a terminal value.
a1, a2, a3,….., an
The values a1, a2, and so on are the terms of the sequence. The terminal
value is an. So it is a finite sequence.
Take the sequence
19
1 
ak    where K  N
k 
The term are
1 1 1
1, , ,......., ,..........
2 3 n
There is no terminal value of the sequence. The sequence is an infinite

sequence.
3.3 Limit of a sequence:
Let a n be sequence of real numbers. L is the limit of this sequence if

for any arbitrarily chosen small number  there exists a positive number N
such that for all nN we have
an  L  
This can be written as
lim an  L
x 
In simple words it means nth terms gets closer and closer to L if n tends to
infinity (Note L is a finite number.)
3.4 Convergent sequence:
If Lim an  L (a finite number the sequence is convergent sequence.)

n 
3.5 Divergent Sequence:
If Lim an   
x 
then sequence is a divergent sequence.
3.6 Oscillatory Sequence:
20
When a n jump back and forth along a number line, it is an oscillatory

sequence. Determine whether the given sequence convergent, divergent or
oscillates. If sequence converges them heel out the limit of the sequence
1
(i)
2n1
1 1 1
(ii) 1, , ,
2 3 4
1
1   1 
n 1
(iii) an 
2  
n
(iv) an 
n 1
1 1 1 1
(v) 1, , , , ,.....
4 9 16 25
Determine whether the following sequence are increasing, decreasing or

neither
(i) an 
 5n  2 
 4n  1
n
(ii) an  3
1  3 
n
(iii) an  n !
2n
Show that the sequence diverge or not if diverges then to  or  
(i)  n2 
(ii)  2n 
(iii)  2n 
(iv)   1
n
Oscillates or not
21
(i)   1
n
(ii)   1 
n
1 1
(iii)  1, 2, , 3, ...... 
2 3
4. Series:
A series is a special type of sequence If ai , i = 1, 2, 3, 4………. is a
sequence them
Sn = a1 + a2 + a3 + ……..an
The sum of terms of a sequence is called a series.
Sn = a1 + a2 + a3 + ……..an
The series Sn is finite series since it is a sum of finite sequence. We can use
symbol  (sigma) for the summation

sn   an
i 1
Series is a special type of sequence. Any result derived for sequence also
applies to series.
4.1 Convergence and Divergence of Series:
If a series is monotonic and bounded it then it has a limit. The series

is convergent. We can use ratio test to determine whether series associated
with sequence ai is convergent or not.
n
If Sn   ai is series associated with a sequence ai and
i 1
22
an 1
lim L
x  an
If (i) L<1 the series Sn converges
(ii) L>1 the series diverges
(iii) L = 1 then series may converges or diverges.
4.2 Arithmetic Series:
A series in which each term is obtained by adding a constant quantity

to its preceding term is known as arithmetic series.
The constant quantity is the common difference.
Look at the sequences
a, a + d, a + 2d, a + 3d,……. , a + (n-1)d
Where n denotes number of terms a + (n-1) d is the nth term or last term of
the arithmetic series.
The sum of arithmetic series in
2
Sn  n 2a   n  1 d 
4.3 Geometric Series:
A series in which each successive term is obtained by multiplying the

previous term by a constant quantity is called a geometric series. The
constant quantity is called common a ratio. The general form of geometric
series with first term equal to a and common ratio equal to r is
a, ar, ar2, ar3,….arn-1
The sum of n terms of a geometric series is
a 1  r n 
Sn 
1 r
a
Sum of an infinite geometric series where r  1 is Sn 
1 r
23
When r  1 the series is convergent and when r  1 the series diverges.
Example :
 nn
Test the convergence of 
n 1 n !
an  1
Using the ratio test lim
n  an
n 22 33
a1  a2  a3 
1 2! 3!
 n  1
n 1
nn
an  an 1 
n!  n  1!
 n  1
n 1
an  1

 n  1!
n
an n
n!
 n  1
n 1
n!
 
n n
 n  1 !
 n  1  n 1
n 1 n 1
 
nn . n n n 1
n
 1
 1  
 n
 1
lim 1    e  1 e  2.71828 series is diversant
n
 n
5. Exercises:
(1) How many terms of arithmetic sequence 24, 22, 20 ..... are receded to
give a sum of 150?
(2) How long it take to pay off n debt of Rs. 880 if Rs. 25 is paid in the
first month, Rs. 27 in the second and Rs. 29 in the third month.
24
(3) The second term of a geometric sequence is 3 and the fifth term in
81/8. Find the eight th term.
(4) The first term of a geometric series is 375 and the forth term in 192.
Find the common ratio and the sum of first for terms.
(5) A man agrees to work at the rate of Rs. 1 for the first day, Rs. 2 for
the second day, Rs. 4 the third day, Rs. 8 for the forth th day etc.
How much would he receive at the end of 15 days.
(6) The population of a certain tour will increase 3% each year for four
years. What is the percentage increase in population after for years?
6. References
K. Sydsaeter and P. Hammond, Mathematics for Economic Analysis,
Pearson Educational Asia, Delhi, 2002.
25
Limit and Continuity
Semester-I
Unit-II
Lesson: Limit and Contiinuity
Lesson Developer: S. K. Taneja
College/Department: Ramlal Anand College (Eve.) , University of
Delhi

Content:
1. Learning Outcome
2. Limit
3. Limit of a Rational function
4. Asymptote
5. Continuity
6. Intermediate Value Theorem
7. Reference
1. Learning Outcome:
After reading this chapter you will be able to know the concept
of limit. Limits of a rational function, asymptote. In addition to
limit the concept of continuity and intermediate value theorem
is explain in detail.
2. Limit:
Observe the function given below
x3  1
f  x 
x 1
The function is not defined for x=1, since the result is % which makes
no sense. However we try to see what happens to f(x) when x is slightly
below or above 1. Take a calculator and try to find out the values f(x), when
x taking values which are slightly move than 1 and slightly less then 1.
Some of the values are given below in table 1.
X .5 .6 .9 .99 .9999 1 1.0009 1.009 1.09

Y 1.75 1.96 2.71 2.970 2.9997 . 3.00270087 3.02708 3.27

As x approaches 1, f(x) takes values which are closer and closer to 3. So, we
can say that f(x) tends to 3 as x tends to 1. This is written as;
x3  1
lim 3
x 1 x 1
Given the above example the idea of limit should be closed intuitively.
What we are looking at is what happens to the value of the function when
the independent variable x approaches a particular value.
We can make a formal statement like this
Suppose y = f(x)
Defined on the interval (a, b)
lim f  x   L
x v
Now x can approach v either from the right hand side (i.e. x takes
values which are greater than g) or from the left hand side (i.e. x tends to a
taking values which are less than v) when x approaches v from the left hand
side we say L is the left hand limit of f(x)
x v 
V means x approaches V from values which are smaller them V.
Similarly

x V 
This is the right hand side limit of f(x)
The limit of a function exists if R.H.S limit = L.H.S limit
x
Look at the function f  x  
x
x f(x)
1 1
5 1
2.5 1
25 1
In this case we make a table like in the previous example what do we

find? As x approaches o from the left hand side the value of the function
gets closer and closes to -1. On the other hand when x approaches 0 from
the right hand side f(x) approaches 1. In this case the right hand limit and
the left hand limit are not equal. The limit of a function exists if and only if
both the L.H.S. limit and RHS limit exist and are equal. So we can say that
the limit of. This can be verified from the figure and table.
x
lim f  x   lim does not exist
x 0 x 0 x
Take another example

1
f  x 
x
What is limit of the function when x tends to 0?
In this case when x tends to 0 from the right hand side the value of
the function increases and when x is very close to zero the value of the
function approaches . On the other hand when x approaches 0 from the left
hand side the value of the function gets closer and closes to  .
This can be observed by glancing at the figure
From the above discussion we can make a statement about the

concept of limit.
Suppose f is defined on an interval (a, b) except possibly at a point C

(a, b). Then f(x)  L if and only if f(x)  L as x  c+
Now we are in a position to give a formal definition of limit of a function.
Definition: Let f be a function defined over an interval containing a, except

possibly at a, and L be a number.
Limit of f(x) as x approaches a, is L, written as
x a
If for any  > 0, however small there exists a   0 such that

f x   L  
Whenever 0 xa 
What does this formal definition of limit means?
It means that x tends to 'a'
Then limit of f(x) is L
If for every neighborhood of L that can be chosen as small, there can be

found corresponding neighborhood of a (excluding the point x=a)in the
domain of the function such that, for every value of x in the a-neighborhood,
its image lies in the chosen L-neighborhood.
This can be explained with the following example.
Let f  x   x2
Now that
lim x 2  4
x 2
Select a small neighbor of L (Here L=4) so the neighborhood of 4

(4-, 4+). Now we construct a neighborhood of 2, say 2   , 2    such that

the two neighborhoods define a rectangle (see the diagram). With two of its
converse lying on the curve. It can be seen that for every value of x lying in
the neighborhood of 2, the corresponding value of the f(x) lies in the
neighborhood of 4. Thus 4 fulfills the definition of limit.
Example:
f  x   x2 x2
lim f x   4
x2
We can make x closes and closes to 2 from both sides (left hand side
and write hand side)
x  2  x  1.8, 1.9, 1.92, 1.98, 1.99..
x  2  x  1.5, 2.25, 2.10, 2.05, 2.005..
As we put this values in the function the value of function gets closes
used closes to 4.
This is happen even thought the function may not be defined for when x= 2.
In order to prove that the limit of the function is 4. We use the formal
definition of limit.
We must show that given any >0 we can find   0 such that
x2  4  when 0 x2 
Choose   1 so that 0  x  2 1
 1   x  2   1
 1 x  3 x2
Thus x2  4   x  2  x  2  x2 x2  x2
So  x  2  5

Take  as 1 or /5 whichever is smaller. Then we have x 2  4   whenever

0  x  2   and required result L is proved.
Theorems of limit.
If lim f  x   h1 and lim g  x   h2

x a xa
(i) lim  f  x   g  x   h1  h2

x a
(ii) lim  f  x   g  x   h1  h2

x a
(iii) lim  f  x  g  x   h1 h2

x a
 f  x  h
(iv) lim   1 , provided L2 is not 0 (L2  0)
xa
 g  
x  h2
Limit of a constant is constant itself.
Suppose f  x   a where a is constant
Them lim f x   a
x B
Theorem:
For any polynomial
f x   a0  a1 x  a2 x 2  ......  a x n
And for any real number a
lim f  x   a0  a1 a  a2 a 2  ......an a n
x 0
 f a
This means that limit of a polynomial f(x) at x=a is the same as the
value of the polynomial at x= a. In the case of polynomial, to find out the
limit at x = a we just are required to evaluate the polynomial at x = a.
3. Limit of a Rational function:

8

A rational function is the ratio of two polynomials. The above mentioned

theorem can be used for computing the limits of rational functions.
Example
x4  4
f  x 
x3
x4  4 lim  x 4  4 
lim f  x   lim  x 2
x 2 x 2 x3 lim  x  3
x 2
20
   20
1
We can say if
h x 
f x  
g x 
Where h(x) and g(x) are polynomial
For any real number a
(i) if g  a   0 the lim f  x   f  a 

x a
(ii) if g  a   0 but h  a   0, lim f  x  does not exist

x a
There is a useful principle for polynomial which in simple words states that.
‘The end behavior of a polynomial matches the end behavior of its highest
degree term'.
lim  a0  a1 x  a2 x 2  .......an x n   lim an x n

x  x 
lim
x 
a
0  a1 x  a2 x 2  .......  an x n   lim an x n
x 
This can be seen by looking at the following:

f  x    a0  a1 x  a2 x 2  .......an x n 
Factoring one the highest power of x from the polynomial
a a a 
f  x   x n  0n  n11  n22  ......  an 
x x x 
Now as x approach  or  all the terms with positive powers of x in

the denominator tend to '0'. So, the above mentioned principles are valid.
4. Asymptote
Definition: a line x = a is called a vertical asymptote of the graph of the

function if f(x)  +  or f(x)   as x approaches a from left or right.
(a) (b)
1 1
Lim   Lim  
x a xa x a  xa
In the case of the above two functions the vertical asymptote is the
line x =a
10

3x  1
Look at the function y 
x
f x   3 
1
x
lim f x   3 lim f x   3
x  x  
As x tends to   the graph of the function y = f(x) gets closer and

closer to line y = 3. Same thing happens when the x tends to. In either
case we call the line y = L, a horizontal asymptote of the graph of the
function f.
We can define:
A line y= L is called a horizontal asymptote of the graph of the

function f if
lim f x   L or lim f x   L
x  x  
Very often when the limit of the function does not exist we may be
interest in finding how the function f(x) behavior when x tends to () and x
tends to 0 (zero) or x tends to a value N.
Vertical asymptote is a vertical line x = c to which the graph of the

function gets closer and closer as x approaches c from the write or from the
left. We are able to get the vertical asymptote by setting the denominator
of the function equal to zero.
11

Horizontal Asymptote: It is a line y= d to which the graph gets closer

and closer as x+ or x-
Example:
 4 x  5
 3x  2 
Vertical Asymptote
2
3x  2  0  x 
3
2 2
This only vertical asymptote is x  . As x approaches from the left or
3 3
2
right f(x) approach the vertical line x  ,
3
4x  5
But, the Horizontal asymptote lim
x  3x  2
45
lim x  4
x 
3 2 3
x
y  4 is the horizontal Asymptote.

3
Example:
2 x  3
x2  2x  3
Vertical Asymptote
x2  2 x  3  0
 x  3  x  1  0
Hence the denominator is zero when x=3 or x=-1
So these lines are vertical asymptotes. The numerator is not zero

when x = 3 or x = -1
Horizontal Asymptote
12

2x  3 2 3
lim  lim x
x 
x  2x  3
2 x 
1 2  3
x x
2
=  2
1
Note x   x 2 when x<0
2x  3
lim
x 
x2  2 x  3
23x
 lim
x 
 x 2
 2x   3 x2
23x 20
 lim   2
x   1  2 x  3 x2  1 0  0
Hence y = 2 and y = 2 are horizontal asymptote.
5. Continuity
A function can be regarded as continuous if its graph can be drawn

without lifting pencil from the paper.
The graph is unbroken
13

In the first figure when x takes a value slightly greater them a the
value of the f(x) jumps up from y1 to y2. The function is not continuous at
x= a if the function is continuous at a point x, there will be small changes in
the value of f(x) for small change in the value of x.
Definition :
A function f is continuous at x = a provided the following conditions

are satisfied
(i) f (a) is defined
(ii) lim f  x  exists.

x a
(iii) lim f  x   f  a 
x a
If one or more of these conditions fail to hold then the functions is

discontinuous at x= a
The function drawn in Fig (1a) is discontinuous at x=a. The limit does not
exist.
In Fig. 2a the function is not defined at x= a. In Fig 2b the f(a) = d

the lim f  x   f  a  . In this figure this function is defined at x= a. The limit
xa
14

of the function at x= a exists but f(x) is not equal to the value of the
function at x = a.
Actually the third condition implies the first two conditions, since it
means, lim f  x   f  a  . This actually means limit exists and the function is
xa
defined at x=a which is f(a) = d
Example:
x2  4
at x  2
x2
F(2) is not defined
lim f  x  exists. lim f  x   2

x 2 x2
the f(x) is not continuous at x = a
 x2  4 
 x  2
f  x   x  2 
 3 x  2 

The function is discontinuous at x = 
Since lim f  x   f  2   3
x 2
 x2  4
 x2
f  x   x  2
 4 x2

F(2) = 4 which is the same as lim f x   4

x 2
Notes :
(1) If a function is continuous at each number in an open interval (a, b)

then we say f is continuous on (a, b).
If the function is continuous on (-, ) we can say that f is continuous

everywhere.
15

The general method of showing that the function is continuous

everywhere is to show that it is continuous at an arbitrary real
number.
While discussing limit of a polynomial we saw that
lim f  x   f  a 
x a
Thus we can make the following statement.
Polynomials are continuous everywhere.
All rational functions are continuous on any interval not containing

zero of the denominator. In other words rational function is
continuous on the interval on which it is defined.
Now we shall state briefly certain theorems of continuous function

which will be useful while finding out whether the function is
continuous or not.
Theorem : If the functions f and g are continuous at c, then,
(i) f + g is continuous at c
(i) f  g is continuous at c
(ii) fg is continuous at c
(iv) f/g is continuous at g if g(c)  0
It will have a discontinuity at c if g(c) = 0
x2  9
Example :
x2  5x  6
The function is a rational function. The denominator becomes o when

x = 2 and at x = 3
If implies that g (c) = g(2) and g (3) = 0. So the function is

discontinuous at x= 2 and x=3.
16

Theorem: If lim g  x   L and if t he function f is continuous at L then

x c
lim f  g  x   f  h  .
xh
That is
 
lim f  g  x    f lim g  x  
x c
 x  c 
Example: f(x) = 5- x2 is continuous everywhere. It is continuous at 3.
lim  5  x 2   5  9   4
x 3
f (3)   4
lim f  x   f  3
x 3
Now f  x   5  x 2 is also continuous at 3
lim 5  x 2  lim  5  x 2   4  4
x 3 x 3
The absolute values of a continuous function is continuous.
Properties of a function defined over closed interval [a, b]. A function

f(x) is said to be continuous on a closed interval [a, b] if the following
conditions are satisfied.
(a) F(x) is continuous an (a, b)
(b) F (x) is continuous from right to a that is lim f  x   f  a 

x a
(c) F(x) is continuous from left at b that is lim f  x   f  b 

x b
Example: f  x   4  x2
17

The natural domain is the closed interval [-2, 2] we will have to find
out the f continuity on open interval (-2, 2) and at two end points -2 and 2.
Take an arbitrary point C
lim f  x   lim  4  x 2   4  c2  f  c 
x c x c
Which proves f(x) is continuous on (2, 2)
The function f(x) is also continuous an end points.
lim f  x   lim  4  x 2   4  4  0  f  2
x  2 x  2
lim  lim  4  x 2   0  f  2 
x   2 x   2
Then f(x) is continuous on closed interval [-2, 2]
6. Intermediate value Theorem:
If f(x) is continuous on the closed interval [a, b] and k is any number

between f(a) and f(b) (inclusive of end points) then there is at least are
number x in the interval [a, b] such
that
F(x) = k.
18

The theorem becomes obvious if we draw the graph of the function.
From the above we can draw another theorem.
If f is continuous on [a,b] and if f(a) and f(b) are non-zero and have
opposite signs, then there is at least one solution of the equation in the
interval (a, b)
f(x) = 0
Example:
f(x) = x3  x  1
When x=1 f(1) = 1
When x=2 f(2) = 5
So the graph of the function intersect the x axis at least once.
We can make the approximation better by reducing the size of interval [1,2]
Example :
(a) X3 – 4x+1 = 0; [1, 2]
19

(b) X3 + x2 -2x = 1; [1,1]
A function f(x) is said to have a removable discontinuity at x = 0 if
lim f  x  exist.
xc
But f(x) is not continuous at x=c either because (i) or (ii)
(i) f(x) is not defined at c
(ii) f(c) differ from lim f  x 

xc
Show that the following functions have removable discontinuities at x = 1
x2 1
(i) f  x 
x 1
1 x  1

(ii) g  x   0 x  1
1, x  1

Solution:
(1) The lim f  x   lim

 x  1  x  1  2
x 1 x 1 x 1
But f(1) is not defined.
So the function is discontinuous at f(x) = 1
Now if we redefine the function
 x2  1
 for x  1
f  x    x 1
 2 if x  1

Then the function becomes continuous.
(i) Similarly in the case of g(x) the limit when x  1 is 1.
So if g(x) = 1 when x = 1
20

The function becomes continuous.
Note: In case the limit does not exist at C then the discontinuity is
irremovable.
7. References:
K. Sydsaeter and P. Hammond, Mathematics for Economic Analysis, Pearson

Educational Asia, Delhi, 2002.
21

Single Variable Optimization
Semester-I
Unit-III
Lesson: Single Variable Optimization
Lesson Developer: Himanshu Singh
College/Department: Satyawati College, University of Delhi
Institute of Lifelong Learning,

1 University of Delhi
Content:
1. Learning Outcome
2. Introduction of Geometric Properties of Functions
2.1 Linearity
2.2 Differentiability
3. Optimization
3.1 First Derivative Test for Maxima and Minima
4. Geometric Interpretation of Rolle's Theorem and the Mean Value Theorem.
4.1 The mean value Theorem for Derivatives
5. Existence, Global and Uniqueness of solution.
5.1 Weierstass Theorem (Existence)
5.2 Global Maximum

5.3 Uniqueness of Solution
5.4 Related examples and their solutions
6. Complete criteria for Maximum, Minimum and inflectional values
7. End Chapter Exercise
8. References
1. Learning Outcome
After reading this lesson you should be able to learn about the geometric properties of
functions (linearity and differentiability); optimization of function (maxima and
minima); geometric interpretation of Rolle’s theorem and mean value theorem;
existence, global and uniqueness of solution.

2. Introduction of Geometric properties of functions
If an economic variable lives in one set, and charges in that variable help to explain
changes in another economic variable, two variables are related. There is a
correspondence between the two sets of variables. If the first set is denoted by S1 and
the second set by S2, the correspondence (which is defined by f) is written f : S 1 
S2 to biggest that f associates elements in S1 with elements in S2. Here f "sends" or
"Transforms" or "maps" x in S1 into y and z in S2 (as shown in the following
diagram).
If each element in S1 gets sent to exactly one element of S2 (see fig. 3.1.b), then f is
called a function. Notice that more than one element in S1 may go tvo a single
element of S2.
The main problem is to find ways of combining functions and describing their
properties. Thus a diagram like that in fig. 3.2.a does not represent a continuous
function. Formally, continuity requires that for any distance  > 0 around f(x), there is

a distance  > 1 around x, so that points within distance  of x get sent to points
within distance  of f(x) as in fig. 3.2.b.
Geometrically speaking, a continuous function preserves some of the structure of the

domain set as this domain set gets transformed into the range. The structure that is
preserved is the "togetherness" of points. If, of course, a function preserves the
"nearbyness" of points, it is reasonable that property of "being a closed set" is
preserved by a continuous function, because a closed set contains its limit points, and
limit points are characterised by the "nearness" of an infinite number of points of a
sequence. If the property of "closedness" of set is preserved by a continuous function,
all structures such as open sets, or compact sets that are define by likewise preserved
under action of continuous function. In brief, a continuous function from R to R
preserves the topological structure of the set it transforms.
Simply stated, a function transforms a set of real numbers into another set of real
numbers. But a set of real numbers may have a certain coherence or structure. This
set, for example, may be open, or compact, or convex, etc. Because that set is
transformed by a rule or function representing and explanation or theory, the structure
of set S1 ought to have an analogue in the structure of the set S2.
Thus the important question to ask is whether or not a function preserves the relevant
structure of the set being transformed. Functions are appropriately classified by the
kinds of structures they preserve under the transformations they represent. For
example, continuous functions transform "nearby" points into "nearby" points. A
function is continuous at a point in the domain, called x, if points close to x get sent to
points close to the image or transformation of x, called f(x).

2.1 Linearity
A function f : R  R is linear if for a1b1    , v
(i) f(a+b) = f (a) + f(b) and
( a)  f (a) First condition con notes that linear function preserves the additive
structure of the real numbers while second condition implies that a stretching of an
arrow from 0 to a by a factor  is preserved under the action of a linear function. All
linear functions from R to R are continuous. In fact, linear functions were created to
preserve arithmetic properties, while continuous functions were created to preserves
topological (or geometric closedness) properties of the real numbers.
2.2 Differentiability: The differential calculus, when all is said and done, is the
study of linear approximation to nonlinear functions. If any nonlinear function has an
associated linear function that approximates it closely, then analysis of non-linear
functions is rather easy. However, it is not possible to associate a linear function with
an arbitrary function from R to R. It is sometimes possible, though, to carve up the
domain of the function to perform a local approximation. Gaining the simplicity of
linearity requires forsaking the global picture. If it is feasible to do a local
approximation in all the chunks of the carved up domain, it may be possible to patch
together the local approximations into some coherent picture. Differential calculus is
after all a local analysis.

In general given an arbitrary function f : R  R, suppose there exist a number a such

f ( x  h) – f ( x ) – ah
that lim
h o  0 . This is read as "a is the derivative of f at x ". If a
h
function is differentiable (i.e. possesses a derivative) at some point in its domain, then
it is continuous at that point.
Linearity, continuity and differentiability are useful characteristics of functions.

However, an arbitrary function does not have to either linear, continuous or
differentiable. Most economic models will require that associated functions to be
differentiable functions. Their "almost" linear form, or locally linear structure, permits
visualisation almost like linear functions, which are continuous. Discontinuity suggest
that small changes in the variables x1....., xn induce large changes in the variable y.
The use of continuous functions to model economic relationships is an expression of
faith by economists
Before elaborating on convex functions and concave functions. We will define

convex set. A convex set in R is a set of points in R such that a line segment drawn
between any two of its points lies wholly within the set. That is, S is convex if x, y
 S C R, and  x  (1 –  ) y  z for 0    1 implies zs. If the function is defined
over some convex set, then the function has some useful properties. In the previous
paragraph we defined linear function f: R  R by two conditions additivity (f (x + y)
= f(x) + f(y), x, y R) and homogeneity ( f (x) , R, xR). These properties taken
individually, define classes of functions that include linear functions. What functions
satisfy additivity alone or a generalisation of it? What functions satisfy homogeneity
property alone or its generalisation?
The "Concavity" generalises the additivity property of linear functions. Intuitively, a

strictly concave function has the property that a line joining two points on the graph
lies below the graph between those two points, as can be seen from the diagram given
below.

Now, we shall examine special class of functions that have significant economic
applications.
Now, we will examine concavity and convexity of a function more formally. Suppose
f : SCR  R, So that f is defined on open set S1 and suppose that S is a convex set; f
is a concave function if, given any x and x̂ is S, and for all
0    1, (i) f (x  (1 –  ) xˆ )   f ( x)  (1 –  ) f ( xˆ )
Because  x  (1 –  ) xˆ = zs, , inequality (i) means that the value of  or the value
of f at some print between x and x̂ , is greater than or equal to the value of f
represented by a point on the line connecting f (x) and f ( x̂ ).
A function f : R  R, defined or an open convex set, is 'strictly' concave if the

inequality (i) is a strict inequality i.e.
f ( x  (1 –  ) xˆ )   f ( x)  (1 –  ) f ( xˆ ) )
Convex function:
A function f: R  R, defined on an open convex set SCR, is convex if, for any x
and

x̂  s, and any   [0,1],
[2] f (  x  (1 –  ) xˆ )   f ( x)  (1 –  ) f ( xˆ ).
If the inequality sign in (2) is a strict inequality, f(x) is strictly convex.
A convex function has the property that a line drawn between two points on the graph
lies on or above the graph between those two points. Thus concave functions look like
parabolas opening downward, and convex functions look like parabolas opening
upward. Linear functions are certainly both concave and convex, but neither strictly
concave nor strictly convex.
Test for concavity and convexity of a function : f" test. Intuitively, if the graph of f
lies above all of its tangents or an interval I, then it is called concave upward or
convex function on I. If the graph of f lies below all of its tangents or I, it is called
concave downward or simply concave function.
Let us now see how the second derivative of the given function helps determine the
intervals of concavity. For a concave function or some interval I, slope of the tangent
line falls continuously over I and for a convex function slope of the tangent line
drawn rises.
This mean that for a concave function, the derivative f1 is decreasing, and therefore f"
is negative. For a convex function, the derivative f1 is increasing throughout the

interval I, and hence f11 is positive. This reasoning can be reversed and suggests that
the following theorem is true [Proof can be provided with the help of the Mean Value
Theorem].
Test :
If f11 >0 for all x in I, the graph of f is convex (concave upward) on I.
If f11(x)<0 for all x in I, then the graph of f is concave downward on I.
Definition:
A point P on a curve f = f (x) is called an inflection point if f is continuous there and

the curve change from convex to concave or concave to convex at P.
In view of the Concavity Test, there is a point of inflection at any point where the
second derivative changes sign.
Example : Sketch a possible graph of a function f that satisfies the following

conditions :
(i) f(0) = 0 , f(2) = 3, f(4) = 6, f1(0) = f1 (4) = 0
(ii) f1 (x) > 0 for 0 < x < 4, f1 (x) < 0 for x < 0 and for x > 4
(iii) f11 (x) > 0 for x < 2, f11 (x) < 0 for x > 2.
Sketch of the solution:
Condition (i) tells us that the graph has horizontal tangents at the points (0,0) and (4,
6). Condition (ii) says that f is increasing (as f1 > o) on the interval (0,4) and
decreasing on the intervals (– , 0) and (4,  ). Condition (iii) Says that the graph is
convex (concave upward) on the interval (–  , 2) and concave on (2,  ). Because the
curve changes from convex to concave when x = 2, the point (2, 3) is an inflection
point. We use this information to sketch the graph in fig.3.5. Notice that we made the
curve bend upward when x < 2 and bend downward x > 2.

Another related application of second derivative is the following test for maximum
and minimum values. It is a consequences of the concavity test. The Second
Derivative Test : Suppose f11 is continuous near C.
(a) If f1 (c) = 0 and f11(c) > o), then f has a local minimum at c.
(b) If f1(c) = 0 and f11(c) < 0, then f has a local maximum at c.
For instance, part (a) is true because f11(x) > 0 near c and so f is convex near c.
This means that the graph1 of lies above its horizontal tangent at c and so f has a local
minimum at c. This may be seen from the diagram below:
Examples: 2

Discuss the curve Discuss the curve y = x4 – 4x3 with respect to concavity, points of
inflection and local maxima and mininima. Use this information to sketch the curve.
Solution:
f(x) = x4 – 4x3, then
f1(x)= 4x3 – 12x2 = 4x2 (x –3)
f11(x) = 12x2 – 24x = 12x(x–z)
To find the critical numbers we set f1(x) = 0 and obtain x = 0 and x = 3. To use the
second derivative test we evaluate f11 at these critical points :
f11 (0) = 0 f11(3) = 36 >0.
Since f1(3) = 0 and f11(3) > 0, f(3) = — 27 is a local minimum. Since f11(0) > 0, the
second derivative test give no information about the critical number 0. But since f1(x)
<0 for x < 0 and also for 0<x<3, the first derivative test tells us that f does not have a
local maximum or minimum at 0.
Since f11(x) = 0 when x = 0 or x = 2, we divide the real line into intervals with these
numbers as and points end complete the following schedule.
Interval f11(x) = 12x (x-2) concavity
(– , 0) + convex
(0, 2) – concave
(2,  ) + convex
The point (0,0) is an inflection point since the curve changes from convex to concave
there. Also, (2, – 16) is an inflection point since the curve changes from concave to
convex there.
Using the local minimum, the intervals of concavity and convexity and the points of
inflection, we plot the curve in the diagram given below:—

Note : The Second derivative test is inconclusive when f11(c) = 0. At such a point
there might be a maximum, or minimum or neither. This test fails when f11(c) does not
exist. In all such cases the first derivative test must be used. In fact, even when both
tests apply, the first derivative test is often the easier one to use.
Example :
Discuss the curve with respect to maxima, concavity and points of inflection. Also
sketch the graph of the function given below:
f(x) = x2/3 (6 – x)1/3
4–x
Solution : f1(x) =
x 1/ 3
(6 – x ) 2 / 3
–8
f11(x) =
x 4/3
(6 – x ) 5 / 3
Since, f1(x) = 0 when x = 4 and f1(x) does not exist when x = 0 or x = 6, the critical
numbers are 0, 4, and 6.
Interval 4–x x1/3 (6–x)2/3 f1(x) f
x<0 + – + – decreasing

on (–,0)
0<x < 4 + + + + increasing

on (0, 4)
4<x<6 – + + – decreasing
on (4, 6)
x>6 – + + – decreasing
on (6.  )
We now apply the first derivation test to find local extreme values (maxima or minima).
Since f1 change sign from positive to negative at x = 4, f(4) = 25/3 is a local maximum. The sign
of f1 does not change at 6. So there is no maximum or minimum there
[The second derivative test could be used at x = 4, but not at x = 0 or x = 6, since

f11(0) and f11 (b) does not exist.
Looking at the expression for f11(x) and noting that x 4 / 3  0 for all x, we have f11 (x)
< 0 for x < o and for 0 < x < 6, and f11(x) > 0 for x > 6. so f is concave on (– – , o)
and (0, 6) and convex on (6,  00) and the only point of inflection is (6, 0). The graph
is sketched below. Note that the curve has vertical tangents at (0,0) and (6,0) because
f 1 ( x)   as x  0 and as x  6.
Strategy for Graphing y = f(x)

(1) Identify the domain of f and any symmetries the curve may have.
(2) Find f1 and f11.
(3) Find the critical points of f, and identify the function's behaviour at each
one.
(4) Find intervals where the curve is increasing and when it is decreasing.
(5) Find the points of inflection, if any occur, and determine the concavity of
the curve.
(6) Identify any asymptotes.
(7) Plot the key points, such as the intercepts and the points found in steps
3,4,5 and sketch the curve.
Illustration : Using the Graphing Strategy:
( x  1) 2
Sketch the graph of f (x) =
1  x2
1. The domain of f is (– , ) and there are no symmetries about either axis or the
origin.
2. Find f1 and f11 x – intercept at x = – 1
y – intercept at y = 1 at x = o
(1  x 2 ) 2( x  1) – ( x  1) 2 2 x 2 (1 – x 2 )
f1  
(1  x 2 ) 2 (1  x 2 ) 2
Critical points: x = – 1, x = 1
(1  x 2 ) 2(–2 x) – 2(1 – x 2 ) [2(1  x 2 ) 2 x]

f ( x) 
11
(1  x 2 ) 4
4 x ( x 2 – 3)
= 
(1  x 2 ) 3
3. Behavior at critical points: The critical points occur only at x =  1 where f` (x)
= 0 since f` exists everywhere over the domain of f. x  – 1, f11 (–1) = 1 > 0
yielding a local minimum by the second derivative test.

At x = 1, f11 (1) = –1 < 0 yielding a local maximum by the second derivative test.
4. Increasing and Decreasing: We see that on the internal (– , – 1) the derivative f1(x) < 0,
and the curve is decreasing on the interval (–1, 1), f1(x) > 0 and the curve is increasing; it is
decreasing on ( 1, ) where f1(x) < 0 again.
5. Inflection Points: It may noted that denominator of the second derivative (step 2) is
always positive. The second derivative f11 is zero when x  3, 0 or – 3 . f11
changes sign at each of these points : negative on (–  , – 3), positive on (– 3, 0),
negative on (0, 3) and positive again on ( 3, ). Thus, each point is a point of

inflection. The curve is concave on the internal – , – 3 . Convex on   
3, 0 .,
concave on (0, 3 ), and again convex on ( 3, )
6. Asymptotes: Expanding the numerator of f(x) and then dividing both numerator
and denominator by x2 yields:
( x  1) 2 x 2  2 x  1
f ( x)   (Expanding Numerator)
1  x2 1  x2
2 1
1 
= x x2 (Dividing by x2)
1
1
x2
We see that f(x)  1+ as x   and that f(x)  1 – as x  –  . Thus, the line y =1 is

a horizontal asymptote.
Since f decreases on (– , – 1) and then increases on (–1, 1), we know that f(–1) = 0
is a local minimum. Although f decreases on (1, ) , it never crosses the horizontal
asymptote y = 1 on that interval (it approaches the asymptote from above). So the
graph never becomes negative, and f(–1) = 0 is an absolute minimum as well.
Likewise, f(1) = 2is an absolute maximum because the graph never crosses the
asymptote y =1 on the interval (–, – 1) , approaching it from below. Therefore, there
are no vertical asymptotes (the range of f is 0  y  2).

The graph of f is sketched in fig. 3.9.
The graph is concave down as it approaches the horizontal asymptote y = 1 as

x  – , and convex is its approach to y =1 as x   .
3. OPTIMIZATION
The concept of rationality, inter alia, means that a decision maker (consumer) firm,
government etc.) tries to find the best alternative out of those available to him. That
is, he tries to optimize.
An optimization problem consists of choice variable, an objective function, and a

feasible set. The problem to choose the most preferred alternative in the feasible set of
alternatives. Our theory, in general allows us to represent this as the problem of
finding the maximum or minimum of the objective function, subject to some
constraints. For this reason, the problem of optimization is Synonymous with finding
constrained maxima or minimum of the given function.
Solution of an optimization problem:
It is the values of the choice variables which is in the feasible set (Set of alternatives
available) and which yields a maximum or minimum value of the objective function
over the feasible set.

Suppose y = f(x) is the objective function and the problem is to maximize f. The
feasible set is S. Then a solution to the problem is the choice variable x* having the
property that:
f ( x * )  f ( x) for all x s .
In also frequency happens that we are concerned with the greatest or least value over
a certain neighborhood in the domain of the function rather than with the absolutely
greatest or least value over the entire domain. The next definition makes this idea
more precise.
Definition: A function is said to have a relative (local) maximum at a point no of the

domain of f if there is a neighborhood N(xo) such that f(xo)  f(x) for all x  N ( xo ) .
A relative minimum is defined in the similar manner. The greatest value (if there is
one) or the global maximum on its entire domain is sometimes called absolute
maximum. The maxima and minima of a function are called the extremes of a
function. Note that the existence of a relative maximum (or minimum) at xo implies
that the function is defined in some neighborhood of xo, N(0o). If xo is an end point of
the domain of f, then the neighborhood is a left or right neighborhood, and the
extreme is sometimes called an end point (boundary point) extreme. The following
figure illustrates some of the ways in which extremes can occur.
The graph is that of a function with relative maximum at each of the points x = x1, x3
and x4, and with a relative (local) minimum for each of x = x2 and x5. The extreme at

x1 is and end point maximum. The extreme at x2 is an (global) minimum. There is no

absolute maximum as the curve is indicated to be rising indefinitely for x>x5.
The figure also suggests that if f(xo) is an extreme and f1(xo) exists, then f1(x0) = o.
This seems to be so at x = x3 and x = x5. At x1, x2 and x5, the derivative apparently does
not exist (only one sided derivatives can exist at such points).
The following theorem covers these conjectures.
Theorem: If a function has an extreme f(xo), then either

(i)f1(xo) = 0, or else
(ii)f1(x0) does not exist.
The converse of the theorem is not true. For example the function defined by f(x) = x3
has a zero derivative for x = 0 since f1(n) =3 x2. Yet f has no extreme at x = o as f1 (x)
> 0 for x  0, so that f is an increasing function over its entire domain, the set of all
real numbers. One can easily verify that the curve y = x3 has a point of inflection with
a horizontal tangent at (0,0).
As an illustration of an instance where the non-existence of the derivative does not

imply an extreme. We will discuss the following function.
A direct consequence of the preceding discussion is started in the following theorems.
Theorem : If the function f is continuous on a closed interval a  x  b and if

f (a)  f (b), then there exist at least one critical point xc in the open interval a<x<b.
If in addition to the continuity of on the closed interval a  x  b, it is known that f1

exists on the open interval, then for the xc of the theorem, we have f1(xc) = 0. We state
this result in the theorem below.
Theorem: Rolle's Theorem.
If the function f is continuous on a  x  b with f(a) = f(b), and if f1(x) exists

everywhere on the open interval a<x<b, then there is at least one number xc, a<xc<b
such that f1(xc) = 0.
Illustration: Suppose we have a function of the form

f ( x)  1 – x 2 |, –2 x2

 1 – x2 for | x | 1 

since f ( x)   
– (1 – x  x – 1 for

2 2
1 | x | 2 

It follows that f is continuous on – 2  x  2. Moreover,
– 2x for | x | 1
f 1 ( x) 
2x for 1| x |  2
1
1/3
Let the function f (x) = (x–2) , . We first find the derivatives : f (x) 1 = ( x – 2) – 2 / 3`
3
and
f11-(x) =
5
–
–2
3
( x – 2)
9
In this case, there is no value of x such that f11(x)=0; however f11(x) fails to exist for x
= 2 also f ` (2) does not exist. The point (2,0) is on the curve (because when x = 2, f(x)
= 0, and it is wident from the expression for f11(x) that the second derivatives function
is continuous except at
x = 2. Furthermore, f11(x) >0 for x<2 and
f11(x) <0 for x > 2.
Accordingly, the curve is convex for x < 2 and concave for x > 2, so that point (2,0)
is a point of inflection. We also see that f11(x)   as x  2, s that the
inflectional tangent is vertical (see figure below).
The main point to note here is that f1(2)does not exist, although (2,0) does not
correspond to extreme point.
Hence f1(x) = 0 for x = 0, and f1(x) does not exist at x = –1 or at x = 1.
Thus, points is an example in which both types of critical points occur.

Furthermore, since f(1) = f(–1) and f1(x) exists for –1 < x < 1, we may regard

the function on the interval – 1  x  1 as an illustration of the Rolle's Theorem

(see figure below):
3.1 First Derivative Test for Maxima and Minima
Theorem : Let f be a continuous function over a N(xc, h) where xc is a critical point of

the domain of f1, and let f1(x) exist for x  N* ( xe– , h) Then
(i) f’(x) > 0 for x  N* ( xe– , h) and f’(x)>0 for x  N* ( xe , h)  f ( xc ) is a maximum.


(ii) f1 (x) < 0 for x N * ( xc , h) and f1(x) < 0 for x N * ( xc , h)  f ( xc ) is a minimum.

–
(iii) f1(x) of constant sign (negative or positive) for x N * ( xc , h)  f ( xc )
Example: Find the extremes of the function defined by f(x) =2x3 + 3x2 – 12x
Solution: f(x) = 6x2 + 6x – 12 = 6( x 2  x – 2)  6 ( x  2) ( x – 1)
If f1 (x) = 0 then x = – 2 or 1. Since f1 is continuous for all x, the only possible

extremes are at these critical values of x. We also find that:
f1(x) < 0 for x < – 2 and for x < 1
f1(x) < 0 for –2 <x < 1.
We see that f1(–2) = 0, and f1(x) changes sign from plus to minus as x increases
through the value –x. The point (–2, 20) is a maximum point on the graph of f. Also,
f1(1) = 0 and f1(x) changes sign from minus to plus as x increases through value 1.
The point (1– 7) is a minimum point on the graph.
Example: Find the extremes of the function defined by f(x) = x2/3.

–1
2
Solution: 1
f (x) = x 3
,x0
3
The domain of f is all xR, yet here is no value of n for which f ` (x) = 0.
The only possible critical value of x is x = 0, for which f1(x) does not exist. Since
f1(x) < 0 for x < 0 and f1(x) > 0 for x> 0, f(o) = 0 is a relative (local) minimum of the
function.
Fig.
In the discussion of concavity of the function, we have already mentioned second

Derivative Test for Maxima and Minima.
4. Geometric Interpretation of Rolle's Theorem and the Mean Value

Theorem.
To explain, geometrically, Rolle's Theorem, let there be y = f(x) with f being

continuous on a  x  b. If f(a)= f(b), then there is at least one point (c, f(c) with
a<c<b, where the tangent to the curve is horizontal (see figure below).
Clearly, if the curve of the function in panel a were turned so that the line AB became
not parallel to the x–axis, the geometric content of Rolle's Theorem would still be
true; the tangent line through point P (in panel b) would still be parallel to the recent
line AB. This apparently evident result is analytically formalised in the Mean value
Theorem.

4.1 The mean value Theorem for Derivatives
Let f be a continuous function on the closed interval a  x  b. and let f ` exist

everywhere on the open interval a < x < b then there exist at least one point c  (a,b)
f (b) – f (a)
s/+ f1(c) =
b –a
This equation is often written in the form obtained by solving for f(b). f(b) = f (a) + f1
(c) (b–a), c  (a,b).
Example : Find the value(s) of c which the mean value Theorem predicts if f(x) = x3 –
x, a = 0, b = 3.
Solution : f(a) = 0, f(b) = 24, f1(x) = 3x2 –1
Hence, for the required value of c, we must have
f (b) – f (a)
f1(c) = Mean value theorem (henceforth MVT)
b–a
24 – 0
3c2–1 =  8, or , c 2  3 or c  3 or – 3
3–0
Since C  (0,3) or 0 < c < 3, the positive root 3 is the desired value of c.
Example: Estimate the value of 110 by using MVT.
Solution: Let (f(x)... = x and use the MVT in the form f(x + h) = f(x) + h f1(x + oh), 0<
1 1
0    1. Since 
2 x and f (x + h)
1 = 1
f (x) 2 x  h , we have
h
xh  x 
2 x  h
5
With x = 100 and h = 10, this formula gives 110  10 
100  100
Since f1 (x) > 0, we have
100  100  100  110

From the preceding two inequalities we have
5
110  10   10.5
100
5 110
Similarly, 110  10   10 
110 22
21
or,  110  10, so that
22
220
110   10.476, Finally, we have
21
10.476 < 110  10.5
Example* : Discuss and sketch the curve :
x3 + xy2 + ay2– 3ax2 = 0, a > 0
Solution :
1. We see that if x = 0, then y = 0 and if y = 0, then x3 – 3ax2 = 0 so that x = 0 or

x = 3a.
Hence the curve crosses the axes at (0,0) and (3a, 0).
2. Since y appears y appears to even powers only, the curve must be symmetric
to the x–axis. This is the only simple symmetry the curve has.
3a – x
3. We solve for y to obtain y   x which show that there is a vertical
x–a
asymptote at x = a provided that x = – a within or on the boundary of the curve. As
we shall notice in the extent of the curve is –a < x  3a,the curve lies to the right of
and is asymptotic to the line x = – a.
4.To determine the extent of the curve (the set of values of x for which y is real), we
3a – x
solve the inequality  0 . The required solution is – a  x  3a .
xa

Note : Solving for x in terms of y involves solving a general cubic equation. Since a
cubic equation. with real coefficients always has one or three distinct real roots, it is
clear that for each real value of y, there is always at least one corresponding real
value of x and there may be three such values. Consequently, there is no restriction
on the extent of the curve in the y-direction.
5.Maximum and minimum points on the curve : Let us consider the two separate
equations :
3a – x 3a – x
y1  x , y2  – x
xa x a
4. These two formulas represent two branches of the curve. The two equations
can now be properly associated with functions rather than relations. Because of the
symmetry of the curve, one branch is symmetric to other with respect to the x-axis. so
that we may restrict the discussion to y1 Since,
3a 2 – x 2
y1`  , we see that x = 3a is a critical value of x.
( x  a) 3 (3a – x)
The nature of the point at x  3 a can be investigated easily by determining the

intervals over which the curve is rising or falling. It is easy to see that :
y1`   0 for – a  x  3 a and,
y1`  0 for 3 a  x  3 a, so that the curve is rising and falling over the
respective intervals. This show that  

3 a, 1.2a is a minimum point.
We have already seen that x= –a is a vertical asymptote. The fact that

| y11 |   as x  a  is consistent with this result. However, at x = 3a, as
x  3a – , y1'  –  . This indicates that there is a vertical asymptote at (3a,0) or

that the curve approaches the x – axis vertically
as x  3a .

12a 3
We see that y  –
"
. It is clear here that the curve is concave
(3a – x) 3 / 2 ( x  a) 5 / 2
1
everywhere and hence there are no points of inflection:
Graph of x 3  xy3  xy 2 _ ay2 – 3ax2  0, a  0.
In the above figure, concave portion of the curve corresponds to y1 and the convex
portion, which is obtained by symmetry, correspond to y2,
Note: The curve is known as a tri-sectrix because of the property that the angle  is
one-third of angle  if x > 0, y> 0 for p(x, y).
Example: Find extremes of the function:
f ( x) | x`|  | x 2 – 1 | For all x  R.
Solution: f 1 ( x)  – 1  2 x for x  (– , – 1)
= – 1 – 2x for x (–1, 0)
= 1 – 2x for x  (0, 1)
= 1  2x for x  (1, )

Note that, here, f1 does not exist at –1, 0,1 and f1 = 0 at .

1 1 1 1
x– and . Also, f "  – 2  0 at x  – , Hence the function has maximum
2 2 2 2
5 1 1
value at x  – , .
4 2 2
1
To investigation the existence of extreme values of f at –1, 0, 1 for 0    , we
4
note that f(–1) = 1, f(–1 –  ) = 1 + (1+  )2–1 = 1+ 3  +  2
f (–1   ) = 1 –   1 – (1 –  ) 2  1  –  2
f(0) = 1, f(0–  ) =  +1–  2, f(0+  ) =  +1 –  2

f(1) = 1, f(1–  ) = 1 –  +1 –(1–  )2 = 1 +  –  2
f(1+  ) =1 +  + (1 +  )2 – 1 = 1 + 3  +  2
These show that the function f has minimum values 1 at x = –1, 0, 1.
5. Existence, Global and Uniqueness of solution.
5.1 Weierstass Theorem (Existence)
An optimization problem always has a solution if:

(a) An optimization the objective function is continuous, and
(b) the feasible set is non-empty
(c) closed
(d) bounded
It should be noted that the conditions of continuity of the objective function and
closedness and boundedness of the feasible set are sufficient but not necessary
conditions for existence of a solution. In other words, solution may exist if they are
not satisfied, but solutions may also not exist. Satisfaction of the conditions, however,
rules out all possible cases of non-existence. Non-emptiness of the feasible set is a
necessary condition for existence of a solution: any problem in which no point is
feasible cannot have a solution.

5.2 Global Maximum

A local maximum is always a global maximum if
(a) the objective function is concave
(b) feasible set is convex.
Example: Show that the curve y = ax3 + bx2 + cx can have only one point of inflexion.
If a is positive, show the curvature changes from concave to convex from below as we
pass through the point of inflection from left to right. Deduce that the point of
inflection is also a stationary point if b2 = 3ac.
Solution: f(x)= y = ax3 + bx2+ cx
f1 = 3ax2 + 2bx +c
f11 = 6a x + 2b
Necessary condition for point of inflection: f11 = 0
–b
6ax + 2b = 0  x 
3a
Sufficient condition for point of inflection: f 111  0.
f 111  6a  0 (if a  0).
–b
Therefore, we have a point of inflection at x =
3a
a  0  f 111  0, Hence, the curvature changes from concave to convex.
[Note: if f111 < 0, curvature changes from convex to concave].
The point of inflection is also a stationary point if f 1 is also zero at that point.
2
–b  –b  –b
f 1
 3a    26 .  c  0
3a  3a   3a 
b 2 2b 2
Or, – c0
3a 3a

– b2
Or, c0
3a
Or, b2 = 3ac.
–b
Hence, The point of inflection is also a stationary point if b2 = 3ac.
3a
5.3 Uniqueness of Solution
Given an optimization problem in which the feasible is convex and the objective
function is non-constant and quasi-concave, a solution is unique if :
(a) the feasible set is strictly convex, or
(b) the objective function is strictly quasi-concave, or
(c) both
5.4 Related examples and their solutions
Example: Show that the maximum value of average product of a is a constant
H 2 – AB
which is independent of fixed amount of b used for the production
B
function : x = 2 Hab – Aa 2 – Bb2
Solution: APa = x  1 2 Hab – Aa 2 – Bb 2

a a
APa will be maximum, if (i) f1= 0 and f11 < 0.
1  a(2 Hb – 2 Aa 
f1 =   – 2 Hab – Aa – Bb  0
2 2
a2  2 2 Hab – Aa – Bb 
2 2
1  – 2 Hab  2 Bb 2 
Or,  0
a 2  2 2 Hab – Aa 2 – Bb 2 

B aH
Or, –2 Hab +2Bb2 = 0 or a =  b or b 
H B

B
It is easy to check that at a   b, f 11  0.
H
6
B
Thus APa will be maximum at a   
H
aH
To get APmax, we substitute b = , and simplify.
B
2 Hab Bb 2
APmax = – A– 2
a2 a
2 Ha aH aH B a2 H 2
= – A–
a2 B a2B2
2 H 2 – AB – H 2
=
B
H 2 – AB
APmax = Hence Proved.
B
Example :
k
Determine the constant K so that the function f(x) = x2 + may have a (i) minimum
x
at x = 2, (ii) a minimum at x = –3. Show that that the function cannot have a local
maximum for any value of K.
K k
Solution: f1(x) = 2x – 2
 0  x3 
x 2
2k
f11(x) = 2 + >0 (for minimum)
x3
k
(by substitutions x3 = )
2
1/ 2
k 
Therefore, x=   is a point of minimum.
2 
x = 2  k = 16

k
(ii) x=–3   – 27  k  – 54
2
1  k 
f (x) = 2  x 3 
11
  13  
 k 
  2    2(1  2)  6  0. Which is always positive.
11
f
 
Therefore, the function cannot have a local maximum for any value k.
Example: A wire of length L is cut into two piece, one being bent to form a square
and the other to form an equilateral triangle. How should the wires be cut if the sum
of the two areas is minimum?
Solution: Let the wire be cut at length  and is used to form the triangle. Side =
 / 3.
3 3  
Altitude = Side  . 
2 2 3 2 3
3
Area of the triangle (equilateral) = . ( Side) 2
4
2
3 
= . 
4  3
2
=
12 3
2
L – 
A2 = Area of the square =  
 4 
2 ( L – ) 2
A = Total Area = A1 + A2 = 
12 3 16
dA 2l 2( L – l )
 – 0
dl 12 3 16

 L–
Or, – 0
6 3 8
4 – 3 3 L  3 3 
Or,  0
24 3
Or,
4  3 3  – 3 3L
0
24 3
Or, (4  3 3)   3 3 L
 3 3 
Or,   L

 4  3 3 
d2A 4  3 3
 0
dr 2 24 3
3 3
Hence, Using fraction of the total length
43 3
L and rest for the square gives the minimum total area.
6. Complete criteria for Maximum, Minimum and inflectional values.
(1) If f(x) has an extreme value at x = a then f1(a) = 0
(2) If f1(a) = f11(a) = ... = f(n–1) (a) = 0, fn) (a)  0, then f(x)has a stationary value
at
x = a which is an inflectional value if n is odd, a maximum value if n is even
and f(n) (a) <0 and a minimum value if n is even and f(n) (a) > 0.
This criterion is complete and so both necessary and sufficient, subject to the
condition that the derivatives involved are finite and continuous.
There is no case of failure; unless the function is a constant (and hence without
maxima and minima) there must always be some derivative which is not zero.
Illustration : Let y = (x –1)5. Now we investigate this function with respect to maxima,
minima or point of inflection.

f(x) = y = (x –1)5
f1 = 5(x –1)4 = 0  x = 1
f11 = 20 (x–1)3 = 0 at x = 1
f111 = 60 (x –1)2 = 0 at x = 1
f(4) = 120(x–1) = 0 at x = 1
f(5) = 120  0 at x = 1
Now, we apply nth derivative test.
Here, the first non-zero derivative at x = 1 is the 5th – order derivative (Odd – ordered
derivative). Hence at x = 1, the function has a point of inflection.
at x= 1, y = (1–1)5 = 0
Thus, (1,0) is an inflection point.
Example : (Curious Case) Find maxima and Minima of the function f(x) = y =
2–x
x x –2
2
x ( x – 4)
f1 
( x  x – 2) 2
2
–1
There are stationary values y = –1 at x = 0 and y = at x = 4.
9
To check sign of the derivative near these stationary values x = 0, 4, we get the
following :
h(h – 4)
f1 = at x = 0+h
(h  h – 2) 2
2
h (h  4)
f1  at x = 4 + h
(h  9h  18) 2
2
As may be easy to see, the first expression changes from positive to negative as h is
given small values changing from negative to positive. The second expression

changes in the opposite sense as h is varied similarly. The function thus has a
1
maximum value –1 at x = 0 and a minimum value – at x = 4.
9
The curious feature of this case is that the maximum value of the function is smaller
than the minimum value. This apparently paradoxical result is due to the fact that the
function has infinite values at x = 1 and at x = – 2. (At each of these values the
denominator of y is zero). The graph of the function illustrates how the presence of
infinities influences the maximum and minimum values.
Example : If a monopolist has a total cot function c  ax2  bx  c and if the demand
law is p   – x 2 , then show that the output for maximum net revenue is
a 2  3 x (  – b) – a
x=
3x
Solution:
Net Revenue =   p. x – c( x)
= ( – x 2 ) x – (ax2  bx  c)
= x – x 3 – ax2 – bx – c

d
Necessary condition for maxima 0
dx
Or,  – 3x 2 – 2ax – b  0
Or, 3x 2  2ax  (b –  )  0
– 2a  4a 2 – 12 (b –  )
Or, x 
6
– 2a  4[a 2  (  – b) 3
Or, x 
b
x : negative ruled out.
– a  a 2  3 (  – b) a 2  3 (  – b) – a
Hence : x  
3 3
d2 
Sufficient condition for maxima: 0
dx2
d2 
 0.
dx2
Hence the output for maximum net revenue,
a 2  3 (  – b) – a
x
3
7. End Chapter Exercise:
1. Without appealing to graphical ideas, find the location and nature of the
extreme of the following two functions and determine if they are differentiable
at these points :
1 3
(a) f ( x)  x  2 x 2  3x  1
2
(b) f ( x)  (2 x – 5) x 2 / 3

1
2. Show that the curve of y  2 x – 3  is convex from below for all positive
x
c
values of x. Is same true for y  ax  b  ?
x
a
3. Show that the demand curve p  – c is downward sloping and
( x  b)
convex from below. Do the same properties hold for marginal revenue curve?
4. If the supply function is x  a p – b  c , where a,b,c are positive constants,

show that the supply curve is upward sloping and concave to axis OP(price
axis) at all points.
5. Find, on the part of rectangular hyperbola x y = 4 in the positive quadrant, the

point which is nearest to the origin and show that the shortest distance is
perpendicular to the tangent at this point.
1
6. Slow that y  x  has one maximum and one minimum value and the latter
x
is larger than the former.
7. A firm has the following total cost and demand functions :
1 3
c Q – 7Q 2  111 Q  50
3
Q = 100 – p.
(a) Find profit maximizing level of output.
(b) Find Equilibrium Level of output (profit maximizing) if the firm is

assumed to fix the price.
8. A radio manufactures produces 'x' sets per week at a total cost of

 x2 
Rs.   3x  100  . He is a monopolist and the demand for his product is
 25 
x = 75 – 3p; p = price/set. Show that the maximum set revenue is obtained

when about 30 sets are produced per week. What is the monopoly price and
net revenue at this level of output?
2x 2 4 x (1  x 2 ) (1  x) (1 – x)
9. Show that f ( x)  4  f ( x) 
1
x 1 ( x 4  1) 2
Also find the maximum value of f on (0, ) . Show that f(–x) = f(x), for all x.
What are the maximum points for f on (– , ) ?
10. Find two positive numbers whose sum 16 and whose product is as large as
possible.
11. Let C(Q) = a Qb + c, for a>0, b > 1, and c  o, be cost function. Prove that the
average cost function has a minimum on (0, ), and find it.
6x3
12. Classify the stationary points of f(x) = , with respect to maxima,
x4  x2  2
maxima and point(s) of inflection.
13. Let f be defined for all x by f(x) = (x2–1)2/3.
(a) Compute f1(x) and f11(x).
(b) Find local extreme points of f, and draw the graph of f.
14. Find possible inflection points for f(x) = x2ex. Draw its graph.
15. Find the intervals where the following Cubic cost function is convex and
where it is concave, and find the unique inflection point:
(Q) = aQ3 + bQ2 + cQ + d, a>0, b< 0, c>0, d>0.
(16) Are the following functions concave or convex (assuming x > 0 in parts (b)
and (c) ?
1 x 1 –x
(a) e  e (b) 2x – 3 + 4 lnx
2 2
1
(c) 5 x 2 – 10 x 3 / 2 (d) 3x2 – 2x + 1 + e–x – 3

8. References
K. Sydsaeter and P. Hammond, Mathematics for Economic Analysis,
Pearson Educational Asia, Delhi, 2002.

Mathematical Methods for Economics: Integration of Functions
Semester-I
Unit-IV
Lesson: Integration of Functions

1
CONTENTS:
 1.0 Learning outcomes of the chapter
 1.1 Areas under curves

 Introduction
 Examples of area under curves
 1.2 Indefinite integrals

 Introduction
 Basic rule of integration
 Some standard results of integration
 Some other results
 Examples of indefinite integral
 1.3 Definite integral

 Introduction
 Steps of evaluating definite integral
 Some basic properties of definite integrals
 Examples of indefinite integral
 1.4 Economic application of integration

 Introduction
 Economic application of integration with example
 References
1.0 LEARNING OUTCOMES OF THE CHAPTER
After completion of the present chapter, you should be able to;

2
 Describe integration by using area under curve

 Evaluate an indefinite integral using an anti-derivative
 Describe an indefinite integral and its application
 Evaluate definite integrals and relationship between differentiation and integration
 Find the area between two curves by using definite integration.
 Understanding economic application of integration
1.1 AREA UNDER CURVES
 Introduction
There are two limiting processes of Calculus. First one is differentiation in which we
study about the tangent to the curve or rate of change in one variable due to change in
other variables. On the other hand, second one is integration, in which we study about the
area under curve integration can be defined as:
“Integration is the process of finding the function from it’s derivative and this function is
called the integral of the function”.
Basically, we use integration to find out area under a curve. We can also find the
area under curve by geometrically. However, concept of integration and differentiation do
not depend on geometry as analytically. A geometrical interpretation is used only to
understand intuitively.
Let y  f (x) be a
continuous and positive function
on the closed interval [a, b] in the
figure (1). We have to find the
area of given function on the
closed interval [a, b]. Now the
question is how do we compute
area (A) under the given graph.
Further, suppose A(x) is

the area that measures the area
under curve y  f (x) on the
closed interval [a, x]
It is clear from the given

figure (1) that;
A (a) = 0

3
Because, there is no area from ‘a’ to ‘a’ and the total area can be defined as,
A= A(b)
Now, we suppose that ‘x’ increases by x amount. Then, A( x  x) is the area under
curve y  f (x) over the closed interval [a, x  x] , Hence, the required area is given by;
A( x  x)  A( x)
It is the area {A} under the curve y  f (x) over the closed interval [ x, x  x]
. Let,
 A be very small i.e. magnified and this area can not be exceed the area of rectangle with
edges x and f ( x  x) and cannot be lesser than area of the rectangle with edges x and
f(x). Hence, x  0 , then;
f (x)Dx £ A(x + Dx)- A(x) £ f (x + Dx)Dx
A( x  x)  A( x)
OR, f ( x)   f ( x  x)
x
If we take x  0 in the above equation then the interval [ x, x  x] shrinks to the
single point ‘x’ and the value f ( x  x) approaches f (x) . So, the function A ( x ) is
differentiable and it measures the area under the curve y  f (x) over the closed interval
[a, x] .Then, the derivative of the function is given by;
A '( x)  f ( x) {x  (a, b)}
This proves that the derivatives of the area function A ( x ) is a curve height function
{i.e. y = f (x)}
Now, suppose F (x) is another continuous function with the function y  f (x) as its
derivative;
Then, F ' ( x)  A '( x)  f ( x) x  (a, b)
d
Because,  A( x)  F ( x)  A '( x)  F '( x)  0
dx
It must also be true that,
A( x)  F ( x)  C {C is some constant}
If A(a) = 0, then
A (a) = F(a) +C = 0
4
Or C = -F(a), put this value in above equation
A( x)  F ( x)  C  F ( x)  F (a){when, F '( x)  f ( x)}

At , x  b, then, A( x)  F (b)  F (a)
In short, the method for finding the area under the curve y  f (x) and its domain
(a,b) or above the x –axis from xa to x  b has following steps;
 Find an arbitrary function F(x),that is continuous over the interval (a, b) such that
F ( x)  f ( x) x  (a, b) ------------------- (i)
 Then the required area of the function is given by
A( x)  F (b)  F (a) -------------------------------(ii)
What happens, if the function y  f ( x ) has negative value in [a, b]. At this
condition, the required area is A( x )  [ f ( b )  F ( a )] . Further, we know that, the area
of a region is always positive. So A( x ) is also positive.
Example 1:
Find the area under the straight line y  f(x) x over the interval [0,1]
Solution:
We have to find the shaded area

(A) in the given figure. According to above
equation (i) and (ii) given above, we must
find a function, that has x as its
derivative.
Then,

5
x2
F ( x) 
2
 d 1
 ( ax )  anx  x ,here ,n  2 & a  
n n 1
 dx 2
x
F ( x)  2 x
2
Thus, the required area is given by;
A  F (1)  F (0)
1 1
=  0  , This answer is reasonable.
2 2
Example 2:
Compute the area under the parabola; y  f ( x )  x2 over the interval [a, b]
Solution:
We have estimated the shaded

area A in the given figure (3).
According to equation (i) and (ii)
given above, we have to find a
function, that has x as its derivative.
Let,
1
F ( x)  x 3
3
1
Then, F '( x)  f ( x)   3x 2  x 2
3
Thus, the required area is given by
1 1
A  F (b)  F (a)  b3  a 3
3 3
1
A  b3  a 3 
3

6
Example 3:
Compute the area ‘A’ under the

straight line y  f ( x)  ax  b over the
interval  , 
Solution:
Let, the shaded area under the straight line

be given by ‘A’, then from equations (i) and
(ii) given above, we get;
1  d 
F ( x)  ax2  bx  (ax n )  anx n1  ax  b 
2  dx 
1
Then, F' ( x )  .2ax  b.1
2
= ax  b
So, the required area A is given by
A  F (b)  F (a)
A  F (  )  F ( )
1 2 1
= a  b  a 2  b
2 2
1
= a(  2   2 )  b(    )
2
=      a(    )  2b 
 2 
Example 4: Find the shaded area ‘A’ of the function y  f ( x)  e x /3  3 over the closed
interval [0, 3 ln 3]
Solution: First we have to find the function F ( x ), whose derivative is e x / 3  3

By using the results of equation (i) and (ii) given above, we take the function,

7
 d x x
F ( x )  3e x / 3  3 x  ( e )  e 
 dx 
Then, F' ( x )  f ( x )  ex / 3  3
So, the required area A is given by,
A = - [F(b) – F(a)]
= (3eln3  3  3ln 3  3e
= - (9-9 ln 3-3) = 9 ln 3-6 (ignore –ve sign)
 A = 3.89 units
Problem Set
1. Find the area under the graph of polynomial y  f ( x )  x3 over the interval [0,1]
1 x x
2. Find the bounded area of the graph of function y  f ( x)  (e  e ) over the
2
interval (-1,1)
3. Find the area under straight line, y  f ( x)  cx  d over the interval [o,1]
4. Compute the area under the parabola y  4 x2 over the interval [o,1]
Answer of the Problem Set
1  1
1. Area (A) = 2. Area (A) = e  
4  e
1 4
3. Area (A)= (ab) 4. Area (A)=
2 3

8
1.2 INDEFINITE INTEGRALS
 Introduction
The previous section of the present chapter discusses the problem of finding an anti-
derivative of the function f(x) i.e. a function F( x ) whose derivative is f ( x ).
F' ( x )  f ( x )
Anti-derivative is an appropriate name. Usually in practice, we call F( x ) an
indefinite integral of f ( x ). It is denoted by the symbol  .
“If f(x) is the differential coefficient of function F ( x ),then F ( x ) is the integral

of f ( x )"
By symbolically, if
d
F ( x )  f ( x )
dx
Then,  f ( x )dx  F ( x )  C
Here ‘C’ is the constant term. We know that differentiation of constant term is zero.
If integral constant 'C' can take any value then the integral is called indefinite integral.
 Basic Rule of Integration
Power Rule: It is defined as;
1
 x dx  n  1 x C n  1 }
n n 1
{
x2
Example:  x dx  2  C
Exponential Rule: It is defined as;
 e dx  e C
x x
ax
And,  a dx  C {a 0&a 1 }
x
loge a

9
e dx  e x  C
x
Examples:
1 ax
 e ax
dx  e C a  0
a
2x
  C
x
2 dx
loge 2
Logarithmic Rule: It is defined as;
1
 x  ln x  C
Example:
1
 t dt  ln t  C
 Some standard Results of Integration
 Constant multiple property
 af ( x )dx  a  f ( x )dx { a is the real constant}
 Integral of sum
  f ( x )  g( x )dx   f ( x )dx   g( x )dx

In general,
  a f ( x)  a
1 1 f ( x)  ......  an fn( x) dx  a1  f1 ( x)dx  a2  f 2 ( x)dx  ......  an  fn( x)dx
2 2
 Integral of Difference
 F ( x )  g( x )dx   f ( x )dx   g( x )dx

In general,
 [ a f ( x)  a
1 2 f ( x)  ......  an fn( x)]dx  a1  f1 ( x)dx  a2  f 2 ( x)dx  ..........  an  fn( x)dx

10
 Integral of Multiplication
d
 f ( x) g ( x)dx  f ( x) g ( x)dx   [ dx
f ( x)  g(x)dx]dx
This property is also known as integration by part.
 Some other results
x 2
1
a 2

dx  log x  x 2  a 2  C 
x 2
1
a 2

dx  log x  x 2  a 2  C 
1 1  ( a  x) 
a 2
x 2
dx 
2a
log   C
 ( a  x) 
Example1:
( 5x  3 x 2  2 x  1 )dx
4
Find the integral
( 5x  3 x 2  2 x  1 )dx
4
Solution:
 5x dx   3x dx   2xdx   dx
4 2
=
= 5 x4 dx  3 x 2 dx  2 xdx   dx
x5 3.x3 x2
= 5  C1   C2  2  C3  x  C4
5 3 2
= x5  x3  x 2  x  C1  C2  C3  C4
=x
5
 x3  x 2  x  C C  C
1
 C2  C3  C4 

11

( e   1 )dx
x
Example 2: Evaluate
x3
1
 (e   1)dx
x
Solution:
x3
 e dx   x dx   1dx
x 3
=
1
= e x  x 2  x  c
2
1
= ex   xc
2 x2
( x  1 )2  2 x 1 / 2
Example 3: Find the integral  x
dx
 ( x  1)2  2 x 1/2 
Solution:   x
dx

 x 2  2 x  1  2 x 1/2 
=  dx
 x1/2 
1
=  (x  2 x1/2  x 1/2  2 )dx
3/2
2 5/2 4 3/2
= x  x  2 x1/2  2ln x  c
5 3
x2
Example 4: Compute  x  1 dx
x2
Solution: Let,  x  1 dx
 x 2  1  1
= dx
 x  1 

12
( x  1 )( x  1 )  1
=  ( x  1)
dx
1
=  ( x  1 )dx   ( x  1 ) dx
x2
=  x  log x  1  c
2
dx
Example5: Evaluate  xc  xd
dx
Solution: Let,  xc  xd
xc  xd )
= dx
( x  c  x  d )( x  c  x  d
xc  xd )
=  ( x  c )  ( x  d ) dx
1 1
=
(c  d )
 ( x  c )1 / 2 dx 
(c  d )
 ( x  d )1 / 2 dx
1 2 1 2
 ( x  c)3/2  ( x  d )3/2  c
(c  d ) 3 cd 3
2 1
 ( x  c)3/2  ( x  d )3/2   c
3 (c  d )
 ( 6 x  9 ) dx
8
Example 6: Find the integration
Solution: By using substitution method,
Let y  6x  9
1
Then, dy  6 dx or dx  dy
6

13
1
 ( 6 x  9 ) dx  6  y dy
8 8
So, we get,
1 y9
= c
6 9
Now putting the value; y  6 x  9 ,then
1
 (6 x  9) dx  54 (6 x  9) C
8 9
 x2
Example 7: Evaluate  4  x 2 dx
 x2  x2  4  4
Solution: Let,  x 2  4 dx =  4  x2 dx
1
=  1dx  4  4  x2
dx
1 2 x
= x  4. log   c (by the formulae)
2 2  2  x 
2  x
= x  log  c
 2  x 
x e
2 2x
Example 8: Evaluate
Solution: Let, I   x 2e2x dx
By using the formulae for integration by part,
d 
I  x 2  e2 x dx    x 2  e2 x dx  dx
 dx 
e2 x 2 x.e2 x
 x2  dx
2 2
1 2 2x
= x e   xe 2 x dx
2

14
1 2 2x  d 
= x e   x  e2 x dx    .x  e2 x dx  dx
2   dx 
1 2 2 x  xe 2 x e2 x 
= x e 
 2   1. 2 dx 
2  
1 2 2x  1 x 1 2x 
= x e   xe  e   c
2 2 4 
1 2 2x 1 2x 1 2x
= x e  xe  e  c
2 2 4
1 2x  2 1
= e x  x    c
2  2
1
Example 9: Find  4x 2
9
dx
1 1
Solution:  4x 2
9
dx =  ( 2 x  3 )( 2 x  3 ) dx
1 A B
Now, let;   ................(i )
(2 x  3)(2 x  3) (2 x  3) (2 x  3)
2 Ax  3 A  2 Bx  3 B
=
4 x2  9
2 x( A  B )  3( A  B )
=
4 x2  9
Now compare both sides of the equation; 2 x( A  B )  3( a  B )  1
Hence 2( A  B )  0 or A=-B and 3(A-B)=1 or B=-1/6 and A=1/6, now by equation (i)
1 1 dx 1 dx
 dx    
4x 2 9
6 2x  3 6 2x  3
1 1
 ln 2 x  3  ln 2 x  3  C
12 12
15
Example 10: Calculate Q(L), where Q’(L) = 6L1/3 and Q(0) = 0
18 4 / 3
6 L dL  L c
1/ 3
Solution: Q(L) =
4
Given L=0, then Q(0)= 0+C or C=0
Then; Q(L) = 18/4L4/3
Problem Set
1. Find the integrals of the following:
A
( 4 x  9 x 2  2 x  2 )dx r
3
(i) (ii) 5/2
dr
 (3t  2t  et )dt x
2
(iii) (iv) x dx
1
 ( ax  b ) dx  a(   1 ) ( ax  b ) c
  1
2. Prove that,
1 x
3. Find the integration (i)  x 2
dx (ii)  2x 2
3
dx
4. Calculate (i) x x 2  1 dx , x  0 (ii)  x 3 x  2 dx
5. If the marginal cost of producing x units for a manufacture product is MC=C’=2x+4

then find total cost function C(x). Given, fixed cost = 40
1
 2(e  e x )dx
x
6. Evaluate
7. Given, f " ( t )  1 / t 2  t 3  2 t  0 and f(1)  0, f' (1)  1/4 then find f(t).
8. Prove that,
2
t at  b .dt = 2
( 3at  2b )( at  b )3 / 2  c
15a
 log x dx  x e dx
5 x
9. Find the integration (i) (ii)
10. Find the general form of the function f(x), whose third derivative is x and also given
f"(0 )  f '(0 )  f (0 )  0
16
1 2x  1
11. Evaluate, (i) x 2
 a2
dx (ii)  ( x  1 )( x  2 )( x  3 )
Answers of Problem Set
1. (i) x4  3 x3  x 2  2 x  c
2A
(ii)  c
3r 3/2
(iii) t 3  t 2  et  c
2 5/2
(iv) x c
5
3. (i) 2[ x  2ln( x  2]  C
1
(ii) ln(2 x 2  3]  C
4
1
4. (i) ln( x 2  1)3/2  C
3
3
(iv) ( x  2 )4 / 3 ( 2 x  3 )  c
14
5. C( x )  x 2  4 x  40
1 x x 1 5
6. (e e ) 7. t t  log t
2 20
9. (i) x log x  x  c
(ii) x5ex  5x4 5ex  20 x3ex  60 x2ex  120 xex  120ex  c
1 4
10. x
24
17
1 xa 1 3 7
11. (i) log c (ii)   
2a xa 12( x  a ) 5( x  2 ) 4( x  3 )
1.3 THE DEFINITE INTEGRAL
 Introduction
Let F( x ) be a continuous function over the interval [a, b] and it has a derivative
f(x) i.e.F' ( x )  f ( x )x ( a ,b ). Then the difference, F(b)-F(a), is called the definite
integral of function f ( x ) over the interval [a, b]. In the first section of the present
chapter, this difference, F(b)-F(a), does not depend on indefinite integrals. On the other
hand, definite integral of f(x) depends only on the function f(x) and its interval [a, b].
Definite integral can be written as;
b
 f ( x)dx  F ( x) ba  F ( x) a  F (b)  F (a)
b
a
where, F' ( x )  f ( x )x ( a ,b ) and the number ‘b’ and ‘a’ are the upper and
lower limits respectively.
 Steps of Evaluating Definite Integral
I  a f ( x )dx
b
Let
 first, find the indefinite integral,  f ( x )dx  F ( x )  c

 Substitute, x = b upper limit in this integral, i.e. F(b) +C
 Substitute, x = a lower limit in this integral i.e. F(a)+C
 Subtract, second {F(b)+c} from third {F(a)+C}
 a f ( x )dx  F ( x ) ba  F ( x ) ba  F ( b )  F ( a )
b

b
Example 1: Find, x dx
a
I  a x dx
b
Solution: Let
b
x2
I c
2 a

18
 b2   a2 
=
2  c 
 2  c 
   
 b2 a 2  1 2 2
=
 2  2   2 (b  a )
 
 Some Basic Properties of Definite Integral
a F ( x )dx  b f ( x )dx

b a

c1, c2 [a, b]

b c1 c2 b
 a
f ( x)dx   f ( x)dx   f ( x)dx   f ( x)dx
b c1 c2
F( a )  F( a )  0
a
 a
f ( x)dx  0
b b b
 a
f ( x)dx   f ( y)dy   f ( z )dz
a a
 f ( x )dx 0 f ( x  a )dx

a a

0
 f ( x )dx 20 f ( x )dx

a a

a
d a( t )
  f ( x )dx  f ' ( t )  f { b( t )}.b' ( t )  f { a( t )}.a' ( t )
dt b( t )
 Every continuous function is integrable, if this function has an anti-derivative i.e.
F' ( x )  f ( x ), x ( a ,b )
1
 ( 2 x  x )dx
1
Example 2: Find
0
1
I  1 ( 2 x  )dx
2
Solution: Let
x
2
2 x2
=  log x
2 1
2
= x 2  log x 1
= [4 + log2]-[1+0]
= 3 + log 2
Example 3: Find the area of the parabola x2  4 by between x  axis and its
ordinate at x3
19

3
Solution: The required area = ydx
0
x2  x2 
  y  
3
= dx
 4b 
0
4b
3
1  x3 
=
4b  3  0
1  27  9
  0 
4b  3  4b

4
Example 4: find
1
x  2 dx
Solution: Let
x  2 If x  2
x2 
 ( x  2 ) If x  2
4 2 4
Then  1
x-2 dx   ( x  2)dx   ( x  2)dx {By property of Integration)
1 2
2 4
  x2   x2 
=   2x    2x
 2 1  2 2
 4   1   16   4 
=   2  4     2  2      2  8    2  4  
        
3 5
= 2  0  2. 
2 2
Example 5: Find the area between the regions of parabola y  x2 and straight line y x
over the interval [-1,1]or ( x, y )x 2
 y  x
Solution: Given y  x2 and y x i.e.

yx or y  x
The required Area
= Area OAB + Area OCD

20
= 2 *Area OAB
(Because, curve is symmetrical about the y axis)
2   xdx   x 2 dx 
1 1
=
 0 0 
 x2 1 x3 1 
= 2  
 2 0 3 0 
 1  1 
= 2   0     0  = 2/3 square units
 2  3 
 K  Qt

T
Example 6: Evaluate  e dt , where T> 0 and K and Q are positive constants.
0
T 
K
  T e
T Qt
Solution: Let W(T) = dt
O
K T
=
T O
eQt dt
T
K  e Qt 
T  Q  O
=
K
= (eQT )  (e ) 
TQ 
W(T) =
K
1  eQT 
TQ
Example 7: Find the area included between the two parabola i.e. y2  4 x and x2  4 y
Solution: Given, y 2  4 x & x2  4 y
Solving both, we get;
 x2 
   4x
4

21
Or, x( x
3
 64 )  0
So, x  0&4
The required area = Area OBCD
4 x2 
   4 x   dx  y 2
 4 x & y  x 2 / 4
O
 4
4
 x3 / 2 x3 
= 2  
 3 / 2 12  0
= 5.3 square unit.
d x 4 2
 e du
2
Example 8: Find
dx x
Solution: By the direct property of integration, we get;
d b( x )
 f ( x )dx
dx a( x )
= f ( b( x )b' ( x )  f a( x )a' ( x )
d x 4 2
 e du
2
Then,
dx x
e ( x  2 x  e x .1
2
)2 2
=
=
2

e x 2 xe x  1
2

Problem Set
1. Find the definite integral for the following:
 t  t 2 dt
3y

1 2 3

3
(i)
1
e x dx (ii) (iii)
1
dy
0
10
d x 2 d u  v2 d u 1
2. Find, (i)  t dt
dx 0
(vi)
du u
e dv (iii)
du  u
x4  1
dx
3. Find the area of line y  4x between x  axis and the ordinate x  4

22
4. Find the area intercepted between the line 3 x  2 y  12 and the parable
3 2
y x
4
5. Find the area between the parabolas; y 2  4ax and x 2  4ay, a  0
6. Prove that
f ( x)dx  2 f ( x)dx, If f ( 2a  x )  f ( x )
2a a
0 0
=0 If f ( 2a  x )   f ( x )
7. Evaluate
1 3000
I  f ( t )dt
1
(i)  (t 
0
t  4 t )dt (ii)
2000 1000
3000000
Given F ( t )  4000  t 
t
1 b
8. Prove that F ( t* )   f ( t )dt
ba a
If f (t ) is continuous function over the interval [a,b] and t*  ( a ,b )
H int : Put F( t )   f ( x )dx
t
Answers of Problem Set
e2  1 4 39
1. (i) (ii) (iii)
e 3 10
1
2. (i) x2 (ii) 2e
u 2
(iii)
2 u4  1
3. 32 sq. units
4. 27 sq. units

23
16 3
5. a sq. units
3
13
7. (i) (ii) I  352
12
1.4 ECONOMIC APPLICATION OF INTEGRATION
 Introduction
Integration has an important role in economics. The present section shows the role
of integration in economics by illustrating some important examples.
Important Results of Integration in Economics
 If f ( r ) is the function of individuals income over the interval [a, b], then the no. of
individuals with incomes in [a, b]

b
=n f ( r )dr
a
 r  earning 
b
 Total income of individuals = n rf ( r )dr
a
 The mean income of the individuals is given by
 r f ( r )dr
b
a
m=
 f ( r )dr
b
Example 1: If the income distribution of population over interval [a, b] is given by,
f ( r )  Ar 5 / 2 {A is a positive constant}, then determine mean income in the
given group.
 
b
b b  2  2 3/2
 f (r )dr   Ar 5/2
Solution: Let dr  A   r 3/2   A a 3/2  b
a a
 3 a 3
 rf ( r )dr  a Ar.r 5 / 2 dr
b b
And
a
 r 3 / 2 dr  2 Aa 1 / 2  b1 / 2 
b
=A
a
So, the mean income of the group is given by
2 A( a 1 / 2  b 1 / 2 ) ( a 1 / 2  b 1 / 2 )
m= = 3 3 / 2
2 / 3 A( a 3 / 2  b 3 / 2 ) (a  b 3 / 2 )
24
Now, suppose b is very large then b-1/2 and b-3/2 close to zero, then m3a
Then, the mean income of the group is 3a.
Economic Application of Integration


There are several other economic applications of integration. Some results are
given below;
 Total cost (TC)
TC   MC (Q)dQ
Here; MC  Marginal cost, Q  output
 Total Revenue (TR)

TR=  MR( Q )dQ , Here, MR  Marginal Revenue
 Consumer surplus (CS) and producer surplus (PS): These can be also
calculated by using definite integral. Consumer surplus is given by;
x
CS   f ( x)dx  p  x
o
Here, f ( x )  demand of x commodity, P Price of x commodity

And, producer surplus is given by,
x
PS  x  p   f ( x)dx
o
 The present discounted value is given by;

T
PDV =
o
f ( t )ert dt
 The future discounted value is given by;

( T t )

T
FDV =
o
f ( t )er dt
 The discounted value at time is given by;

T
DV =
t S
f ( t )er ( t s )dt
Example 2: Find total cost function from the given marginal cost function;
MC  f '(q)  2  3q1/2  5 / q 1/2 , Given; f(1) = 11
q3/2 q1/2
Solution:  F (q)   f '(q)dq   (2  3q1/2  5q 1/2 )dq  2q  3.  5. c
3/ 2 1/ 2
TC  2q  2q3 / 2  c
25
 When q=1 then 11 = 2 + 2 + c, Then, c=7
 Total cost function F(q) = 22 + 3q3/2 + 10 q1/2+ 7
  
Example 3: If the marginal revenue function is given; Pm=    ,
(x   )
2

  
Then, show that P=     is the demand law
(x   ) 
dR
Solution:  R  P.x and MR=
dx
  
 R   MR.dx      dx
 (x   )
2

( x   )1
 R    x  A
1

 R  P.x    x  A,
(x )
We know that if output x=0 then revenue is also zero. Then A = 
 x
R  P x    x    x
(x   ) (x   )

OR, P  , Hence proved.
x
Example 4: If marginal revenue (MR) = 16  q 2 , find the maximum total revenue, also
find the total, average revenue demand.
Solution: When TR is maximum, then MR= 0
16  q 2  0  q  4
4
4 4  q3  128
TR   MR dq   (16  q )dq  16q   
2
0 0
 3 0 3

26
x3
Total Revenue (TR) =  ( 16  x )dx  16 x  3  c
2
when x  0 then c  0
TR q2
Average Revenue (AR) =  16 
q 3
Then, Demand (AR) =P = 16  q 2 / 3
Example 5: If marginal propensity to consume (MPC) function is given as follows;

dc
 0.5  0.001 y , then find total consumption function. Given at income zero, c is 0.02.
dy
dc 0.001 2
Solution: C   .dy   ( 0.5  0.001 y  0.5 y  y A
dy 2
At  y  0, then, C = 0.2, Hence, A = 0.2
C  0.5 y  0.0005 y 2  0.2
Example 6: The sales of a product is depicted by a function S(t) = 100e -0.5t, where t is
number of years since the launching of the product, find
a) The total sales in the first three years

b) The sales in forth year &
c) The total sales in the future
S( 3 )  0 100e0.5 t dt  155.40
3
Solution: a)
b) S( 4 )  S( 3 ),
S4  3 100e0.5 t dt  17.6
4
e)

S(  )  0 100 e0.5 t dt  200
Example 7: If the demand function is;

P  30  2 x  x2 and the demand is 3, what
will be the consumer surplus (CS)?
Solution: Given, P  30  2 x  x2 30

27
For x  3, then p = 20
3
 CS = 
0
(30  2 x  x 2 )dx  P  x
3
 2 x2 x3 
=
30 x  2  3   3  20
 0
= 90-9-9-60=12 units
Example 8: The demand and supply laws are Pd  (6  x)2 and Ps=14+x respectively. Find
the consumer surplus (CS), If;
(i) The demand and price are determined under perfect competition and;
(ii) The demand and price are determined under monopoly and the supply function is
identified with marginal cost function.
Solution: (i) CS under perfect competition: at the equi8librium
( 6  x )2  14  x  x  2
Then, P=14+x=16
CS  0 ( 36  12 x  x 2 )dx  16  2  56 / 3
2
(ii) CS under monopoly;
TR = Pd x  ( 36  12 x  x 2 )x  36 x  12 x 2  x3
 MR  36  24 x  3 x2
And supply price: Ps = 14  x , supply function Ps=MC
To maximization of profit we know that,
MR=MC
36  24 x  3 x2  14  x
i.e. x  1, or,7.33
At x  1, then, Pd=25

28
1 16
 Hence, CS   (36  12 x  x 2 )dx  25 x)  unit
0 3
Similarly, we obtain CS at x  7.33

Example 9: Obtain the producer surplus, when the demand and supply function is given;
D  20  4 x and S  4  4 x
Solution: At equilibrium condition,
Demand(D) = Supply (S)
20  4 x  4  4 x
or ,8 x  16
then; x  2
and , P  4  8  12
And, P=4+8=12
Then, producer surplus (PS)
2
 P  x   (4  4 x)dx  24  [4 x  2 x 2 ]02
0
 24  16  8units
Problem Set
1. If the inverse demand function of commodity Q is given; P = 3q-1/2 and presently 100
units are being sold, then find the consumer surplus.
Ans. 30
2. Let interest rate will vary and represent by r(t). What is the present value of a flow
of income P(t) from t=a to t=b using this variable interest rate?
b 
 r ( s ) ds
Ans. e a
P(t )dt
a

29
REFRENCES
 Allen, R.G.D, Mathematical Analysis for Economists, London: Macmillan and Co. Ltd
 Knut Sydsaeter and Peter J. Hammond, Mathematics for Economic Analysis, Prentice Hall
 Michael Hoy, John Livernois, Chris Mckenna, Ray Rees, Thantsis Stengos, Mathematics for
Economists, Addison-Wesley Publishers Ltd.

30
Introduction to Statistics
DC-1
Semester-II
Lesson: Introduction to Statistics
Lesson Developer: Sarabjeet Kaur
Department/College: Department of Economics, P.G.D.A.V. College, University of

Delhi

Contents
1. Learning outcomes of the chapter

2. Introduction to Statistics
2.1Graphical Presentation
2.2 Histogram and Discrete Variable
2.3 Histogram in Continuous Variable and equal class interval
2.4 Steps in the construction of histogram when class intervals are unequal
2.5 Histogram Shapes
2.6 Examples of Dot Plot
2.7 Examples for Histogram and Ogive
3. USE OF STATISTICS IN ECONOMICS
4. Limitations of Statistics
5. Population and Sample
5.1 Population
5.2 Sample
6. Parameter and Statistic
7. Exercises
8. References

After completing of the present chapter, you should able to:-
1. Introduction of Statistics.
2. Characteristics of Statistics.
3. Use of Statistics in Economics.
4. Difference between Population and Sample.
5. Difference between Parameter and Statistics.
2. Introduction to Statistics
Statistics is the evaluation of the collection, organization, analysis, presentation and

interpretation of data to assist in making more effective decisions. It is a science of
methods of obtaining and analyzing data in order to make decisions based on them. It is a
branch of mathematics used in dealing with aspects that can be represented numerically or
categorically either by counts or measurements.
I t i s widely employed in various activities of business, government, and the natural and
social sciences. It is not only facts and figures; it refers to a range of techniques and
procedures for analyzing, interpreting, displaying, and making decisions based on data.
Hence, there are five stages in a statistical investigation, explained in following diagram i.e.

(1) Collection of data:- This is the first step and is the foundation of statistical analysis.
Therefore, data should be gathered with maximum care by the investigator himself (primary
data) or obtained from reliable published or unpublished sources (secondary data) .
(2) Organization of data:- Data must be organized by editing, classifying and tabulation so
that collected information can be easily assessable.
(3) Presentation of data: - Organized data must be presented in some systematic manner
so that statistical analysis becomes easier. Data can be shown with the help of tables,
graphs, and diagrams etc.
(4) Analysis of data:- After collection and organization, the data are to be reproduced by
various methods used for analysis such as averages, dispersion, correlation, and
interpretations etc.
(5) Interpretation of data:- Last step is interpretation of data, implies drawing of conclusion
on the basis of analysis of data. On the basis of conclusion various decisions can be taken.
The word "statistics" is commonly used in two ways, in the first way, "statistics" is used in
plural sense meaning numerical facts or data, called as "Descriptive Statistics". It deals with
collecting, analyzing, and clarifying data; which are otherwise quite unwieldy and immense.
It seeks to achieve this in a method that significant decisions can be easily obtained from

the data. It may; thus; seen as encompassing methods by bringing out and feature the
latent characteristic present in a set of numerical data. It not only makes easier
understanding of the data and systematic reporting thereof in a manner that makes them
manageable for further consultation, investigation, and evaluation.
For example, the NSSO reports the population of the India was 449.6 million in 1960; 555.2
million in 1970; 699 million in 1980; 868.9 million in 1990; 1.042 billion in 2000 and 1.206
billion in 2010, this information is an example of descriptive statistics. We call it as
descriptive statistics, if we estimate the percentage growth from one decade to decade.
However, we cannot call as descriptive statistics, if we use these to find out the population
for the year 2020 or percentage growth of population from the year 2010 to 2020; because
these statistics are not being used to calculate past population but to predict future
population.
Masses of unorganized data (e.g., census of population, earnings of workers etc.) are of
fewer values. However, statistical methods are available to arrange this sort of data into a
useful form. Data can be arranged into a frequency distribution. Different graph may be
used to describe data. A well-thought and analytical data grouping makes possible easy
description of the hidden data characteristics by means of variety of summary measures.
These includes measures of central tendency, dispersion, etc, it make the necessary scope
of descriptive statistics.
Today, with the development of probability theory, statistics is used to make prophecy or
comparison about a totality of observations (or population) using data collected about a
very little portion of that population. This technique is called as "Inferential Statistics". It is
also known as Inductive Statistics or Statistical inference.
It is the technique of finding conclusions from the set of data that are subject to random
variation; for instance; sampling variation. Most particularly, the term inferential statistics is
used to define systems of procedures that can be helpful in drawing conclusions from
datasets arising from systems influenced by random variation; for example; experimental
errors, random sampling, or random experimentation. First and foremost requirements of
such system of procedures for inference and induction are that the system should be able to
provide reasonable answers when applied to well-defined situations and that it should be
general enough to be applied to all type of situations. These statistics are basically used to
test hypotheses and make estimations using sample data. The two branches of inferential
statistics are estimation and hypothesis testing. The result of inferential statistics can be

helpful in making decision about further experiments or surveys, or about drawing

conclusion before implementing any organizational or governmental policy.
2.1 Graphical Presentation of Data
We often present statistical information in a graphical form. A graph is often useful for
capturing reader attention and to portray a large amount of information. This method can
be used to illustrate the way in which one property changes when some other property
undergoes a measured change. For the visualization of data, there are a number of types of
graphs. They are given below;
 Bar Graph: A graphical method of presenting qualitative data that have been
summarized in a frequency distribution or a relative frequency distribution.
 Pie Chart: A graphical device for presenting qualitative data by subdividing a circle
into sectors that correspond to the relative frequency of each class.
 Dot Plot: A graphical presentation of data, where the horizontal axis shows the range
of data values and each observation is plotted as a dot above the axis.
 Histogram: A graphical method of presenting a frequency or a relative frequency
distribution or a density distribution.
 Ogive: A graphical method of presenting a cumulative frequency distribution or a
cumulative relative frequency distribution.
 Scatter Diagram: A graphical method of presenting the relationship between two
quantitative variables. One variable is shown on the horizontal and the other on the
vertical axis.
Basically, frequency distribution is simply a grouping of the data together, generally in the
form of a frequency distribution table, giving a clearer picture than the individual values.
The most usual presentation is in the form of a histogram or a frequency polygon that is
represented by dot plot.
A Histogram is a pictorial method of representing data. It appears similar to a Bar Chart but
has two fundamental differences:
1. Type of data: Bar graphs are usually used to display "categorical data", that is data that
fits into categories. Histograms on the other hand are usually used to present "continuous
data", i.e. data that represents measured quantity where, at least in theory, the numbers
can take on any value in a certain range.

2. Presentation of Data: The difference in the way that bar graphs and histograms are
drawn is that the bars in bar graphs are usually separated; whereas, in histograms the bars
are adjacent to each other, this is not always true however. Sometimes you see bar graphs
with no spaces between the bars but histograms are never drawn with spaces between the
bars.
2.2 Histogram and Discrete Variable
Histograms allow a visual interpretation of numerical data (discrete or continuous) by

indicating the number of data points that lie within a range of values, called a class or a bin.
The frequency of the data that falls in each class is depicted by the use of a bar. For each
class, a rectangle is constructed with a base length equal to the range of values in that
specific group, and an area proportional to the number of observations falling into that
group. This means that the rectangles will be drawn of non-uniform height. When the
variables are continuous, there are no gaps between the bars but when the variables are
discrete; sometimes gaps should be left between the bars. For example, following data
represents the number of children in a particular family:
children frequency
1 3
2 8
3 10
4 2
5 3
For the construction of histogram, relative frequency have to be calculated for each of the
observation; which is equal to:
Relative frequency =
Calculating the relative frequency of the above set of data:
children frequency relative frequency
1 3 3/26 ≈ 0.12
2 8 8/26 ≈ 0.31

3 10 10/26 ≈ 0.38
4 2 2/26 ≈ 0.08
5 3 3/26 ≈ 0.12
A histogram is constructed by drawing rectangles for each class of data. The height of each
rectangle is the frequency or relative frequency of the class. The width of each rectangle is
the same and the rectangles touch each other. The corresponding histogram of the above
example is:
2.3 Histogram in Continuous Variable and equal class interval
Histogram are commonly used in case of continuous variable
Steps in construction
 Firstly, construct a Frequency Distribution Table.
 The width of all intervals is equal.
 Construct the frequency distribution table by the help of Tally charts.
 Intervals are often left the same width but if the data is scarce at the extremes then
classes may be joined.

 If the intervals are not all the same width, calculate the frequency densities.
 Construct the histogram labeling each axis carefully.
 Hand drawn histograms usually show the frequency or frequency density vertically.
Histogram with unequal class intervals:
When classes have unequal widths, the vertical axis of a histogram must represent not
frequency (number of occurrences) but frequency density (relative frequency divided by its
class width), and the class widths must be accurately represented on the horizontal axis, so
that the area of each bar (not the height) represents the frequency of that class. The
frequency density shows the number of units vertically for every unit horizontally.
2.4 Steps in the construction of histogram when class intervals are unequal
Find the relative frequency of all observations
1. Divide the relative frequency of each observation by the corresponding class width to
get the frequency density.
2. Construct the histogram with frequency density as the height of the rectangle and
class intervals as the base of the triangle
• (Note; if the frequency distribution is inclusive, convert them into exclusive
• If mid values are given, find out the lower and upper limits of the various
classes before constructing the histogram.)
Consider the following example:
Score Frequency
0 - 50 25
50 - 60 10
60 - 100 20
Solution:
Score Frequency Density
0 - 50 25 25/50 = 0.5
50 - 60 10 10/10 = 1.0
60 - 100 20 20/40 = 0.5

The area of each rectangle is the relative frequency of the corresponding class. Since, the
total of relative frequency is always equal to one; hence, total area of all rectangles in a
density histogram is equal to one.
2.5 Histogram Shapes
Symmetric: A histogram is symmetric if right half of histogram is exactly equal to left half.
i.e, the two sides of the distribution are a mirror image of each other. For example, in a
normal distribution, points are as likely to occur on one side of the average as on the other.
A Biomodel histogram has two peaks. This happen when data having two different kinds of
individuals or objects. For example, a distribution of production data from a two-shift
operation might be bimodal, if each shift produces a different distribution of results.

Asymmetric Histogram: An asymmetric histogram is not equally balanced. In other

words, the two sides will not be mirror images of each other. Skewness is the tendency for
the values to be more frequent around the high or low ends of the x-axis. When
a histogram is constructed for skewed data it is possible to identify skewness by looking at
the shape of the distribution.
A distribution is said to be positively skewed when upper tail is stretched towards right as
compare to the left side which means that majority of data has values towards the upper
end of its range. Most of the values tend to cluster toward the left side of the x-axis (i.e. the
smaller values) with increasingly fewer values at the right side of the x-axis (i.e. the larger
values).
For example, the distribution of personal income is positively skewed. Also, raw scores on
most measures of psychopathology are positively skewed.

A distribution is said to be negatively skewed when the tail on the left side of
the histogram is longer than the right side. Most of the values tend to cluster toward the
right side of the x-axis (i.e. the larger values), with increasingly less values on the left side
of the x-axis (i.e. the smaller values).When upper tail is stretched towards right as compare
to the left side which means that majority of data has values towards the lower end of its
range. For example, a distribution of analyses of a very pure product would be skewed,
because the product cannot be more than 100 percent pure.
Frequency polygon

It is constructed by joining the midpoints at the top of each column of the histogram. The
final section of the polygon often joins the midpoint at the top of each extreme rectangle to
a point on the x-axis half a class interval beyond the rectangle. This makes the area
enclosed by the rectangle the same as that of the histogram.
2.6 Examples of Dot Plot
In a recent campaign, many airlines reduced their summer fares in order to gain a
larger share of the market. The following data represent the prices of round-trip tickets
from Atlanta to Boston for a sample of nine airlines.
120 140 140
160 160 160
160 180 180
Construct a dot plot for the above data.
Answer: The dot plot is one of the simplest graphical presentations of data. The horizontal
axis shows the range of data values, and each observation is plotted as a dot above the
axis. The figure shows the dot plot for the above data. The four dots shown at the value of
160 indicate that four airlines were charging Rs.160 for the round-trip ticket from Atlanta to
Boston.
DOT PLOT FOR TICKET PRICES
2.7 Examples for Histogram and Ogive
The following information of waiting times at first county bank

Waiting Times Relative Relative Percentage
(Seconds) Frequency Frequency
60 - 119 0.2000 0.2000 20.00
120 - 179 0.3333 0.5333 53.33
180 - 239 0.2667 0.8000 80.00
240 - 299 0.1333 0.9333 93.33
300 - 359 0.0667 1.0000 100.00
Construct (i) Histogram for the waiting times
(ii) Ogive for the waiting times
Answer (i)
Histogram of the waiting times at first county bank
Waiting Times (in seconds)

Answer (ii)
Ogive for the cumulative frequency distribution of the waiting times at first county bank
Waiting Times (in seconds)
Waiting Times
Pie Graphs
It illustrates how a whole is separated into parts. The data is presented in a circle such
that the area of the circle representing each category is proportional to the part of the
whole that the category represents.
For example, a circle graph is shown in Data Analysis given below. The title of the graph is
“United States Production of Photographic Equipment and Supplies in 1971”. There are 6
categories of photographic equipment and supplies represented in the graph.

3. USE OF STATISTICS IN ECONOMICS
Statistics deals with every aspects of human activity. Statistics holds an important position
in different fields like Commerce, Industry, Chemistry, Economics, Mathematics, Biology,
Botany, Psychology, Astronomy etc, Therefore, application of statistics is very wide.
A number of economics problems can be easily understood by the use of statistical tools. It
helps in formulation of economic policies. Statistical data and advanced techniques of
statistical analysis are immensely useful in the solution of variety of economic problems
such as production, consumption, distribution of income, wealth prices, saving investment,
unemployment etc. For instance, the analysis of consumption pattern of the people may
reveal pattern of income spent on different heads of consumption by collecting relevant
information.
Statistical study is a quantitative tool mostly used within the economic area and is always
needed in a variety of ways such as determine the effectiveness of economic theories with
the help of the study of empirical real-world data, explaining cause-effect relationships
between variables for the use of assisting in the making of powerful public policy, estimating
the future actions of necessary economic conditions for the purpose of minimizing
uncertainty in making up of different business or public policy decisions, or adopting
mathematical models by incorporating actual data.

National income accounts are multipurpose measure for the administrators and economists.
Various statistical measures are used for construction of these accounts. In economics
research, statistical measures are used for collecting, organizing and analysis of the data
and testing hypothesis on it.
In the field of production, comparative study of productivity of various elements of

production, i.e. land, labour, capital, and enterprise can be done with the help of statistics.
The effectiveness of various policies can be easily done with various statistical techniques. It
also plays very important role in trade, both internal and external, where data on cost and
selling price helps in making demand of the commodity.
In short, statistics is very useful in every field of economics. It provides facts, direction to
solve a problem, evolution of economic laws, and also helps in economic planning.
4. Limitations of Statistics
Despite of usefulness of statistics in almost all sciences - social, physical and natural,
impressions should not be carried statistics can be used as a big magic which gives us
the accurate results to the problems. In spite of the wide scope of the subject it has
certain limitations and nevertheless the data neither properly collected nor interpreted
there is always chances of drawing wrong conclusions. Therefore, it is necessary to know
the limitations of statistics. Some important limitations of statistics are the following:
(i) Statistics does not study individuals:
Statistics deals with aggregate of facts. Single or isolated figures are not statistics.
Data are statistical when they correlate to computation of masses, not statistical
when they correlate to an individual item or phenomenon as a different entity. This is
considered to be a major handicap portion of statistics.
(ii) Statistics does not study qualitative phenomena:
Statistics are numerical statement of facts and figures. It is not applicable to the
study of those facts that are not quantitatively measurable. These attributes cannot
be explained in numbers. Qualitative phenomena, e.g., honesty, intelligence,
poverty, etc, cannot be studied in statistics unless these attributes are expressed in
terms of numerals. So, the quality aspect of a variable or the subjective
phenomenon falls out of the scope of statistics. Hence, this limits the scope of the
subject.

If there are k quantitative characteristics and we wish to focus on one of them, we

can assign the no. 1 to that and 0 to all of the others. Counting tha1’s gives us the x
value & x/n is the proportion of times that a characteristic is observed. If p = x/n it
can be used to estimate the population proportion p=x/N
(iii) Statistical laws are not exact:
Statistical laws are not exact as incase of natural sciences. The conclusions obtained
through the phenomenon are not specifically or universally true, they are true but
only under some conditions. This is because statistics as a science is less exact
compare to natural sciences. So, statistics has less practical utility.
5. POPULATION AND SAMPLE
5.1 Population
The term "Population" normally means persons in a town, region, state, or country and their
respective attributes such as gender, age composition, marital status, educated and so on.
In statistics, the term "population" used in a different sense. It not only concerned with
number of people living in a area, but it also covers the population of households, a
population of events, objects, procedures or observations, including services like visits to
the doctor, or surgical operations. A population is thus a totality of creatures, events,
things, cases and so on. In short, a unit of population is whatever you count or measure.
Normally, a population should be relatively large in size and hard to infer some attributes by
considering its elements individually. It is impossible to theoretical survey the entire
population because all the members are not observable. If it is possible to reach the entire
population but it is very costly and also time consuming . Alternatively, researcher could
take a subset of this population called a sample. By using this sample, conclusions can be
drawn about the population under study.
A census may be preferred when the size of population is not too large. It may be desirable
to take the recourse to a census where the respondents are not widely scattered and
reliability of data is not a case when a census is just unavoidable.
A Conceptual population consists of all the values that might possibly have been observed.
It is also called as tangible population. e.g., a geologist weighs a rock several times on a
sensitive scale. Each time, the scale gives a different reading. Here the population is
conceptual because it consists of all the readings that the scale could in principle produce.

The population of undergraduate students of a college is a real population and it exists. If

you had the time to measure all the students' heights, you could. The population of
Economics students is a conceptual population created over time. All of the students taking
Economics this semester is clearly in this population, but are not also the ones who took it
last semester, and those "eligible" to take it next semester. In that sense, it is a conceptual
population; it doesn't really exist in its entirety now, but does over time.
5.2 Sample
A part of the population is called a sample. It is a proportion of the population, or a part of it

and it represents all the characteristics of population. A sample is scientifically drawn group
that actually possesses the same attributes of the population. It may consist of two or more
items that have been selected from same population. The lowest possible size of sample is
two and the highest one could be equal to the size of population. An effectively selected
sample will involve most of the information about a specific population parameter but the
connection between the sample and the population must be such as to enable reasonably
accurate conclusions to be made about a population from that sample. The relation between
population and sample can be expressed in following diagram:
Example: If researcher wants to find out the mean height of the students in a particular
class room, then students in that room would represents the population. But if researcher
wants to find out the mean height of the students in that particular college, the students in
that particular room would represent a sample of the students in that college. The basic
unit of the population in called as element of the population. Each student is an element of
the college. Thus, a population is the totality of elements being studied and a sample is

part of the population.
The interesting relationship between population and sample is that the population can exist
without sample, but sample may not exist without population; thus, sample depends upon
population. A sample is not studied for its own sake. The basic objective of its study is to
draw inference about the population.
The samples are essential because within several models of research, it is impractical (from
both a strategic and a resource perspective) to examine all the members of a particular
population for a research assignment. However, census taking often is expensive, too time
consuming to provide information when it is needed. It is not feasible to include the whole
population when elements are destroyed to obtain information. Instead, a selected few
participants (which is called as sample) are chosen to ensure that the sample is true
representative of the population. Hence, the result obtained from the sample can be
concluded for the population, i.e., using information on a smaller group of participants to
infer to the group of all participants.
Normally, certain attributes of the items in the population are too being examined, for
example, the mean height of the children’s in a village. A characteristic may be categorical,
such as gender etc, or it may be numerical. In the former case, the value of the
characteristic is a category for example female, whereas in the later case, the value is a
number for example age=40 years. A variable is a characteristic that may assume more
than one set of values to which a numerical measure can be assigned.
x= weight of the student.
y= age of the student etc.
Data outcomes from making consideration either on a single variable or concurrently on two
or more variables.
Uni-variate data refers to data where researchers are only observing one aspect of a
population or sample at a time, e.g. height of students. With two variable or bi-variate data,
researchers observe two aspects and if there are more than two variables then multivariable
data, e.g. height, weight, and age of students.
When sample is obtained from population, an investigator would frequently use sample
information to draw some type of conclusion about the population. It is imperative that the
sample is representative of the group to which it is being generalized. This branch is called

as inferential statistics.
Various statistical techniques such as random sampling, stratified sampling, systematic

sampling, multi-stage sampling and quota sampling are there but most commonly used is
random sampling.
In order to use statistics to learn things about the population, the sample must be random.
A random sample is one in which each and every elements of a population has a fair chance
of being chosen. The most commonly used sample is a simple random sample. It requires
that each and every possible sample of the selected size has an equal chance of being used.
Since, simple random sampling normally does not ensure a representative sample; when
population is heterogeneous; a sampling technique called stratified random sampling is
used. The sample becomes more representative of the population; when this sample is
selected by using this technique. This method can only be used when the population can be
divided in number of distinct "strata" or groups. In stratified sampling, you first identify
members of your sample who belong to each group. Then from each of the sub-group
(called strata), a randomly select a sample in such a manner that the sizes of the subgroups
in the sample are proportional to their sizes in the population.
It is required that the each strata used in stratified sampling must not overlap. Having
overlapping subgroups will provide some elements a higher chance of being selected in the
sample. If this happened, it would not be a probability sample.
In a systematic sample, the items of the population are placed in the form of a list and then
every nth item in the list is selected (consistently) for insertion in the sample. For instance,
if the population of research study includes 5,000 students in a particular college and the
researcher required a sample of 500 students. The students would be put into a form of list
and then every 10th student would be selected for the sample. To check against any
possibility of human bias in this method, the researcher should pick the first element at
random. This is called a 'systematic sample with a random start'.
6. PARAMETER AND STATISTICS
Parameter is a statistical measure evaluated from population or census data. It is a

characteristic of the population. It is a value which expresses the entire population, for
example, population mean, standard deviation, mode etc.

Statistic is a statistical measure calculated from sample data. It is a value that express the
characteristic of a sample, for example sample mean, and it is also used to infer about the
corresponding a population parameter. Hence, a sample should represent the entire
population.
Examine a set of n data, x1, x2, …., xn; If this set of data represents a population, then its
mean value and the standard deviation of the population are given by:
µ=
σ=
Whereas, if this same set of data represents a sample, then its mean value and its standard
deviation are given by:
=
s=
Here, the formula for mean is same whether the data for a population or a sample. But, the
formula of the standard deviation depends on the interpretation of the data as population or
sample.
A statistic and parameter are very similar. A parameter is a numerical value that is
equivalent to an entire population characteristic, such as mean and standard deviation,
which are explanatory of entire population, are known as population parameters whereas a
statistic is a numerical value that describes a sample but not the whole population are called
statistic. Inferential statistics authorize researcher to make an informed guess about a
population parameter based on a statistic computed from a sample randomly drawn from a
particular population (see Figure )
Figure: Shows the relationship between sample and population with their statistical
measures:

For instance, suppose an investigator examine the population of dogs in Delhi and if
investigator wants to examine the mean height of all the dogs in the town then it is called
as population "parameter". If investigator selects a sample of 50 dogs from the town and
investigate mean height from that sample of dogs then it is called as 'statistic'.
A value of parameter always remains constant. It is not a random variable because all the
units in a population always remain same whereas statistic is a random variable whose
value varies from sample to sample because units selected in two or more samples are not
the same and different sample will give different values. Variation in the value of statistics
is called sampling fluctuation.
The use of statistics is a means to an end. It is a quantity that helps to determine the
unknown parameters of a population based on only few observations.
It is used to estimate the degree to which sample statistics approximate the population
parameters. Investigator basically bothered about the population and the estimated
parameters corresponding with that population. However, investigator cannot obtain these

values, so investigator relies on sampling and statistics to provide researcher with

conclusion about the population. A sufficient statistic is one that guarantees that no other
statistic can produce more accurate information.
7. Exercise:
1. How does statistics help in the solution of Economic problems? Explain with
examples.
Ans: Solution of various economic problems can be better analysed and understood
with the help of different statistical tools as discussed in details with examples given in
section 2 of the chapter.
2. Explain the difference between parameter and static with the help of examples.
Ans: Parameter is a statistical measure computed from population data whereas

static is a measure of sample as given in section 4 in detail of this chapter.
3. Explain the importance of Statistics in Economics. What are the limitations of it?
Ans: Statistics holds a central position in Economics detailed given in section 2.2 and
2.3.
4. Explain the importance of Sample. How is it helpful in research?
Ans: small in size, less time consuming etc. given in section 3.
5. Explain histogram. What are its different shapes?
Explained in section 2.1
8. References:
1. Jay L. Devore, Probability and Statistics for Engineers, Cengage Learning, 2010.
2. John E. Freund, Mathematical Statistics, Prentice Hall,1992.
3. Earl k. Bowen and Martin K. Starr, Basic Statistics for Business and Economics,
McGraw-Hill, 1983.

Numerical Measures in Descriptive Statistics
DC-1
Semester-II
Paper-III: Statistical Methods in Economics-I
Lesson: Numerical Measures in Descriptive Statistics
Lesson Developer: Amit Girdharwal
College/Department: Department of Economics, Dyal

Singh College, University of Delhi

TABLE OF CONTENTS
1 Learning Outcomes
2 Introduction
3 Types of Averages
3.1 Mathematical Averages
3.1a Arithmetic mean
3.2 Averages of position
3.2a Median
3.2b Mode
4. Measures of Dispersion
4.1 Range
4.2 Standard Deviation and Variance
5 Skewness, Moments and Kurtosis
6 References
After going through this chapter, you should be able to:
- Understand the concepts of different types of averages such as positional and

mathematical and their measurement.
- Explain the concept of Variability and how to measure it such as standard deviation
and variance.
- Elaborate the concept of normal distribution and how to measure the departure from
normal distribution using moments measures of skewness and kurtosis.
2. Introduction
One of the most widely used set of summary figures is known as measures of location, which
are often referred to as averages or central tendency or central location. The purpose of
computing the average value for a set of observation is to obtain a single value which is
representative of all items in the data set. The single value is the point of location around
which the observations of the sample (or population) cluster.
Statistical series may differ from each other in the following ways;
 Data may differ in values of the variable around which most of the items cluster and
can be measured by central tendency or averages.

 They may differ in extent to which items are dispersed around the central value and
can be measured by measures of dispersion.
 There may be difference in the extent of departure from a normal distribution and can
be measured by skewness and kurtosis.
3. Types of Averages
There are two types of averages
3.1 Mathematical averages

a) Arithmetic mean
3.2 Averages of position
a) Median
b) Mode
Mathematical Averages
3.1.a Arithmetic mean:

Arithmetic mean of a series is the figure obtained by dividing a total value of various
items by their number of observation in the data set. The mean is computed by
summing the observations and divided by the number of items in the series.
If are the observations in the sample, where n denotes the
sample size.
The sample mean is denoted by where
Example: The marks scored by sample of 10 students from the toppers in the college
are listed below. A sample of the marks of top ten student of the class in statistics
was asked to collect. Calculate the sample mean.
Marks : 99, 98, 96, 95, 94, 92, 89, 88, 86, 85
(out of 100)
Solution:
= 92.2 marks.

Properties of the mean:
1. Mean is not independent of change in origin and scale.
Proof: Let =
Then =
=a or .
= i.e.
2. Mean is sensitive to the extreme values of the data set.

3. The algebraic sum of the deviations of the given set of observations from the
arithmetic mean is zero.
i.e.
Or for a frequency distribution
Proof: = =
= ( is a constant and )
= ( )
Short cut method:
=A+
= Actual arithmetic average.
A = Assumed arithmetic average.
= the Sum of the deviations from the assumed mean i.e.
n = Number of items.
Discrete Variables
=A+ h where =

If the deviation are further divided by a common factor and if this factor is represented by i
then (step deviation method).
=A+
=A+ .
Continuous variables:
If data set is for a continuous variable then ’s are the mid points of the class. If all the
classes have the same interval then h is the class interval.
Example: Calculate the mean marks obtained by the candidates from the following data:-
Class Interval No. Of Candidate

10-20 7
20-30 15
30-40 18
40-50 25
50-60 30
60-70 20
70-80 16
80-90 7
90-100 2
Solution:
Class Interval Mid values No. Of Candidate Deviation Step Deviation
10-20 15 7 -40 -4 -28
20-30 25 15 -30 -3 -45
30-40 35 18 -20 -2 -36
40-50 45 25 -10 -1 -25
50-60 55 30 0 0 0
60-70 65 20 10 1 +20

70-80 75 16 20 2 +32
80-90 85 7 30 3 +21
90-100 95 2 40 4 +8
n=140 -53
=A+
= 55 + = 51.22 marks
Weighted Arithmetic Average:-
In calculating the weighted arithmetic average of each value of the variable is multiplied by
its weights and the products so obtained are aggregated. The total is divided by the total of its
weights and resulting figure is the weighted arithmetic average.
w = =
Where w= Weighted arithmetic average.
= Value of the variables.
= Their respective weight.
Example:
An examination was held to decide the award of a scholarship. The weights of various
subjects are different. The marks obtained by 3 candidates (out of 100 marks) are given
below. Who will get the scholarship?
Subject Weight Marks A Marks B Marks C
Economics 4 87 85 90
Mathematics 3 90 88 80
Statistics 2 90 85 88
English 1 93 93 92
Solution:

Subject Weight Marks Weighted Marks

A B C A B C
Economics 4 87 85 90 348 340 360
Mathematics 3 90 88 80 270 264 240
Statistics 2 90 85 88 180 170 176
English 1 93 93 92 93 93 92
Total 10 360 351 350 891 867 868
Weighted arithmetic averages of Marks:
For A: w = = 89.1
For B: w = = 86.7
For C: w = = 86.8
Candidate A Should get the scholarship.
3.2(a) Median:
The second most popular measure of central location is the median. The median is calculated
by placing all the observations in order (ascending or descending). The observation that falls
in the middle of the series is termed as median. It means that median divides the series in two
equal parts. The median of a sample is denoted by and the population median is denoted by
Calculation of Median:
The calculation of median ( ) of a sample involves two steps
1. Location of middle item

2. Finding out its value in case of discrete = ( )th item is middle item when n is odd. If n
is even then median is the average of the two middle most items i.e. identify ( )th and
th
.
th
3. Finding out its value in case of continuous series = ( ) item is middle item.
n = Total number of observations.

Example:
Compute the median of the following item: 5,7,9,12,10,8,7,15,21
Solution:
Series in ascending order: 5,7,7,8,9,10,12,25,21
n=9
( ) = Size of ( )th item
=( )th item
=
5th item i.e. 9 in discrete series.
Example:
Find out the value of median from the following data.
Marks No. of Student
90 15
85 20
83 28
82 20
95 9
Solution:
The marks are in ascending order:
Marks F Cumulative frequencies(C)
82 20 20
83 28 48
85 20 68
90 15 83
95 9 92
th
( ) = item
th
= item = 46.5th item i.e. 83.

The cumulative frequency C must be just greater than N/2. 48 is just greater than 46.5 and
the value of X(marks) corresponding to 48 is 83. Thus median number of marks is 83.
Continuous Series:
( ) = L1 + . This is an interpretation formula used after identifying the median

class.
( ) = Value of median.
h = Class interval.
= Frequency of the median class.
= the cumulative frequency of the class preceding the median class.
Further Classification
 Trimmed Mean:
The mean is very sensitive to the outlying values in the data set, whereas the
median is only the middle value. This extreme behaviour of either type might be undesirable,
we may need an alternative measure that is neither as sensitive as mean nor as insensitive as
median. The mean and median are opposite extremes of the same family of measures. The
mean is the average of all the data, whereas median is the middle value or the average of the
two middle values. To interpret, the mean is calculated by trimming 0 per cent from each end
of the sample, whereas for the median the maximum possible amount is trimmed from each
end. The trimmed mean is a compromise between mean and median.
The trimmed mean have some of the advantages of both mean and median without some of
the disadvantages. The 10 % trimmed mean is the mean computed by excluding the 10%
largest and 10% smallest values from the sample and taking the arithmetic mean of remaining
80% of the sample. For example consider the data:
5, 4, 7, 6, 8, 10, 11, 0, 7, 16.

Arranging in ascending order
0, 4, 5, 6, 7, 7, 9, 10, 11, 16.
The 10% trimmed mean omits 0 and 16 and yields
= = 7.375
Quartile: The values which divide the data into four equal parts are known as
quartiles. There will be three such points Q1, Q2, Q3 such that Q3≥ Q2 ≥ Q1, termed as
three quartiles. Q1, known as the first quartile is the value which has 25% of the items
of the distribution below it and 75% of the items are greater than it. The second

quartile Q2 coincides with the median and has an equal number of observations above
it and below it. Q3, the third quartile has 75% of the observations below it and 25 % of
the observations above it. It divides the second half of the series into two equal parts.
th
Q1 = item
th
Q3 = item
 Deciles: These are the values which divide a series into 10 equal parts. There are 9
deciles denoted by D1, D2, D3 . . . . . . . . D9.The fifth decides is the median of the
series.
th
D1 = value of item
th
D2 = value of 2 item
 Percentile: These values divide the series into 100 equal parts and these are 99
percentiles.
th
P1 = value of item
th
P2 = value of 2 item
th
P99 = value of 99 item
3.2(b) Mode:
Mode is the most common item of the series. The mode is defined as the observation
(observations) that occurs within the greatest frequency. Mode is denoted by ‘Mo’.
For example in following series the value of mode is 25
Value: 10 12 15 25 30 50
Frequency: 5 15 20 40 10 8
Some series are bimodal for example:
Marks: 30 35 40 45 50 60
No. Of student: 5 15 25 25 14 6
In this case mode: 40, 45 (two modes)
However in all cases maximum frequency may not necessarily signifies maximum
frequency density. For example
Value: 10 12 14 16 18 20 22
Frequency: 50 80 70 78 77 30 15
In this series the value 12 has the highest frequency 80 but it is not the mode of the
series because the maximum frequency density is around the value 16. The values on

either side of 16 have fairly large frequencies. The values 14 , 16 , 18 account for ( 70
+ 78 + 77 = 225) frequency out of a total of 400. So the concentration of frequencies
is around the value 16 rather than 12.
For continuous variables
Mode = L1 + ×
f0 = Stands for the frequency of the preceding class.
f2 = Stands for the frequency of the succeeding class.
L1 = lower limit of modal class.
L2 = upper limit of modal class.
 In a symmetrical distribution mean, median, mode is identical and has the same
value. If a distribution is skewed then
For positively skewed distribution mean > median > mode
For negatively skewed distribution mean< median< mode
Mo touches the peak of the curve, indication maximum frequency density.

M divides the area of the curve into two equal parts, and is the centre of the gravity.
4. Measures of Variability:
A measure of average gives only partial information about the distribution. Data sets
may have the same average value but difference in other aspects. We need to also
obtain a measure of the spread of the distribution. A measure of variability of the data
set gives us another characteristic of the distribution.

4.1 Range:
Range is the simplest possible measure of dispersion. It is the difference between the values
of the extreme items of a series. Symbolically,
Range(R) = where
Sometimes, for purpose of comparison, a relative measure of range is calculated. If range is

divided by the sum of the extreme items, the resulting figure is called “The Ratio of the
Range. Symbolically The ratio of Range
Coefficient of Range =
The main drawback of the range is that it depends on only two most extreme observations
and disregards the position of the remaining n - 2 values.
4.2 Inter-Quartile Range;
It is a measure of dispersion based on the upper quartile Q3 and the lower quartile Q1.
Inter Quartile range = Q3 - Q1

But it is based on only the middle 50% of the distribution.
4.3 Variance and Standard Deviation:
The spread of distribution can be inferred from the deviations of individual values of the data
from their average or mean value. Some of the deviations will be positive if
and other will be negative for . However, we cannot use an average of the deviations
to measure the variability of the data as = 0 so that = 0. The
technique of the calculation of mean deviation is mathematically illogical because in its
calculation the algebraic signs are ignored or omitted. Mean Deviation where the
signs are ignored. However, there are problems with using this mean deviation. The
commonly used measures of dispersion are the variance and standard deviation.
Variance: The variance is the mean of squared deviations about the mean of the series. The
sample variance, denoted by , is given by
= =
Note that the divisor is in calculating sample variance, while in calculating population
variance, denoted by , the divisor is N

What is the rationale behind it? Just as will be used to make inferences about the
population mean µ, we should define the sample variance so that it can be used to make
inferences about . Note that involves squared deviations about the population mean µ.
If we actually knew the value of µ, then we could define the sample variance as the average
squared deviation of the sample s about µ. However, the value of µ is almost never known
so the sum of squared deviation about must be used. But the s tend to be closer to their
average then to the population average µ, using n as the divisor lead to an underestimate of
So to compensate for this the divisor is used rather than n. Thus if µ then
> ,
If divided by n > underestimation So divided by n-1 to compensate

for this.
The sample standard deviation, denoted by s, is the square root of the variancex
s = =
Note that s2 and s are both nonnegative. There is an alternative formula for sxx that avoids
calculating the deviations. The formula involves both , summing and then squaring,
and squaring and and then summing.
Sxx = = - =
Properties of Variance:
1. Variance is independent of change of origin but not of scale.

Let
+
= , hence Variance is independent of change of

origin.
Let
So
=
Variance is not independent of scale.
2. Sum of square of deviations is minimum when taken about the mean.

To show is minimum if A=
Let
will be minimum if and >0
=
Putting or
= +2
is minimum if A= .
Standard deviation is the square root of the arithmetic average of the squares of the
deviations measured from the mean. Thus, in the calculation of standard deviation, first the
arithmetic average is calculated and the deviation of various items from the arithmetic
average are squared. The squared deviations are totalled and the sum is divided by the
number of items. The square root of the resulting figure is the standard deviation of the series.
Symbolically: s= =
Where s stands for the standard deviation of the sample data, for the sum of the squares
of the deviations measured from the arithmetic average and for the number of items.
Variance and standard deviation of population are denoted by and σ respectively.
Definitional formula:
=
Computational formula:
=
Standard Deviation in discrete series:
Example: Calculate the standard deviation from the following data:-

Size of item Frequency
6 3
7 6
8 9
9 13

10 8
11 5
12 4
Solution:
Size of Frequency Size × Deviation Dev. Frequency

Items Frequency From The Squared × Square
Average up of
Deviation
(X) (f) (fx) (d) ( )
6 3 18 -3 9 27
7 6 42 -2 4 24
8 9 72 -1 1 9
9 13 117 0 0 0
10 8 80 1 1 8
11 5 55 2 4 20
12 4 48 3 9 36
=
n = 48 = 432 =28 124
Arithmetic average = = =9
Standard Deviation = = = 1.6 ans.
Standard deviation in continuous series:
In continuous series the class intervals are usually of equal size the deviation from the
assumed average can be expressed in class interval units, or in other words, step deviations
can be found out by dividing the deviations by the magnitude of the class intervals. The
formula for the8 calculation of standard deviation is then written as follows:-
s= ×i
Where i stands for the common factor or the magnitude of the class interval, and dx stands for
the deviations in class interval units, and other signs stands for what they stood in previous
formula.
The standard deviation of N natural numbers is
s=
Coefficient of variation:

Coefficient of variation C.V. = × 100
The variance and the consequently standard deviation is independent of change of origin but
not of scale.
5. MOMENTS, SKEWNESS AND KURTOSIS
Moments:
‘’Moment is a familiar mechanical term for the measure of a force with reference to its
tendency to produce rotation. The strength of this tendency depends, obviously, on the
amount of the force and the distance from the origin of the point at which the force is
exerted”. In statistics moments of random variable about some points are used to describe
the various characteristics of frequency distribution viz., dispersion, skewness and
kurtosis. Central moments are defined
=
Where is the rth central moment.
=
Where is the rth non central moment about the origin.
Skewness:
Skewness is opposite of symmetry and its presence tells us that a particular

distribution is not symmetrical or in other words it is skewed.
The following three figures would give an idea about the shape of symmetrical and
asymmetrical distribution.
Measures of Skewness:
If a particular distribution is found to be skewed the next problem that series is to measure
the extent of skewness. Measures of skewness are meant to give an idea about the extent of
symmetry in a series. They are also called first measures of skewness. The first measures of
skewness are based on the assumption that in a skewed distribution the values of mean,

median and mode do not coincide. This being so, the difference between any two of these
values indicate the extent of skewness. Thus the first measures of skewness are:-
i. Mean – Mode or ( - Mo)

ii. Mean – Median or ( - ) Absolute
iii. Median – Mode or ( - Mo) Measures of Skewness
Where = Mean
= Median
Mo= Mode
These measures of skewness suffer from some drawbacks. These are:-

 These measures are expressed in the units of a distribution (like rupees, tons,
litres, etc.), and as such cannot be compared with measures expressed in another
distribution in different units. For example, absolute skewness of weights of
students (expressed in Kg) cannot be compared with the skewness in their heights
(expressed in centimetres).
 There is considerable variation in different distributions. In one distribution the
difference between mean and median may be very large as compared to similar
difference in another distribution and yet the two distributions, if plotted in a
graph paper, may give similar curves.
Relative Measures of Skewness:
For the purpose of comparison it is necessary to have relative measures of skewness.

Relative measures of skewness are obtained by dividing the absolute measures by any one of
the measures of dispersion. The absolute measures of skewness should not be divided by a
measure of central tendency or average because our present problem is not to study the extent
of skewness in relation to the size of items, but it is to study the symmetry in relation to the
dispersal of items around a central value. The purpose of studying skewness is to find out
how much more or less do the items on one side deviate from the items on the other side of a
central value. Therefore, absolute measures of skewness should be divided by a measure of
dispersion rather than a measure of central tendency. Relative measures of skewness are
known as Coefficient of skewness.
Karl Pearson’s Coefficient of Skewness:
This formula is based on the difference between mean and mode. It uses standard deviation as
the divisor. It is expressed as follows:
Jp =
The moments coefficient of skewness is:
=
= =

Kurtosis:
Kurtosis is yet another measure which tells us about the form of a distribution. It tells us
whether the distribution, if plotted in a graph would give us a normal curve, a curve more flat
than the normal curve or a curve more peaked then the normal curve. If a distribution is more
peaked than the normal distribution it is called “Leptokurtic”. If the distribution is more flat
than the normal distribution it is called “platykurtic”. The normal distribution is known as
“mesokurtic”.
The following diagram illustrates the above three types of curves:
In the figure given above curve No.1 is normal or mesokurtic, curve No.2 is more peaked
than the normal curve and is leptokurtic and curve No.3 is more flat than the normal curve
and is platykurtic.
6. References:
Jay L. Devore: Probability and Statistics for Engineering and the Sciences,
Cengage Learning, 8th edition.

Statistical Methods In Economics - I
DC-1
Sem-II
Chapter:Elementary Probability Theory

Content Developer: Neha Goel
College / University: Shyamlal College
(University of Delhi)
Institute of Lifelong Learning , University of Delhi

 2: Introduction
 3: Some Important Terminology
 3.1: Sample Space
 3.2: Events
 3.3: Mutually Exclusive Events
 3.4: Mutually Exhaustive Events
 3.5: Equally Likely Events
 3.6: Independent Events
 4: Definitions of Probability
 5: Examples of Probability
 6: Properties of Probability
 7: Summary
 8: Exercises
 9: References
 10: MCQs

- Understand the concept of probability and random experimentation.

- Define the important terminology important to understand the probability theory.
- Apply set theory to get a better understanding of the concept of probability.
- Understand the properties and theorem of probability.
- Provide reasoning for taking decisions under uncertainties.
2. Introduction
“Probability theory is nothing but common sense reduced to calculation”.
- Laplace, Pierre Simon
Galileo, in the 17th century, laid down some ideas on dice games. This evolved some ideas
and discussions that constituted the stepping stone of the probability theory. Probability is
the branch of mathematics which involves random experimentation. Jerome Cardan, an
Italian mathematician was the first to pen down the idea of probability. However, Pascal and
Fermat laid down the basic and formal foundation in the subject.
Deterministic experiments are those, ceteris paribus (keeping all things/conditions

constant), if repeated experiments obtain same results. For example, anywhere in the
world, if we throw a ball on a floor, it will bounce back with the same force. Whereas, some
experiments are non-deterministic i.e. they don’t obtain the same results even with
constant experimental conditions. For example, tossing an unbiased coin or picking a queen
of heart from a pack of well-shuffled cards (having 52 cards of 4 equal suits i.e. spade, club,
heart and diamond) or studying the distribution of income among rural and urban Indians or
congress winning the 2014 general elections again. These experiments or events are
random in nature as their outcomes are unpredictable. Probability theory is developed to
study and deal with such random experiments.
3. Some Important Terminology
3.1Sample Space:
Generally, denoted by letter ‘S’, a sample space consists all possible outcomes of a
random/non-deterministic experiment. Suppose we need to find out which day of the
week will 29th February in 2016 appear? Then the sample space is:
S = {Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday}
Thus, the days in a week here become the elements or members of this set, where
each day has exactly one possible outcome, i.e. 29th February can fall on any one of

these days. Thus, here ‘S’ is the sample space and its elements i.e. days in a week
are referred to as sample points.
Now suppose we need to find the possibility of getting a ‘6’ in a roll of a die. Thus,
the sample space becomes S = {1, 2, 3, 4, 5, 6}, i.e. in a roll of a single die, there is
equal probability of getting a 1, or a 2, or a 3, or a 4, or a 5, or a 6. Now suppose we
roll two cubic dice together. The sample space now is calculated by Nn, where n is
number of trials or number of times experiment takes place or is repeated, N is the
number of elements in the sample. So, the sample space for a die rolled twice is:
S = {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6)
(2,1), (2,2), (2,3), (2,4), (2,5), (2,6)
(3,1), (3,2), (3,3), (3,4), (3,5), (3,6)
(4,1), (4,2), (4,3), (4,4), (4,5), (4,6)
(5,1), (5,2), (5,3), (5,4), (5,5), (5,6)
(6,1), (6,2), (6,3), (6,4), (6,5), (6,6)
Thus, n(S) = 36
Way of representing a sample space through venn diagram.
Now, suppose we paint one die white (W) and another die black (B). If we close our
eyes and pick up a die randomly and want to study the occurrence of a white or a
black die, then the sample space becomes S = {W, B}. Now suppose we just want to
study the occurrence of a black die. So either the black die occurs or it doesn’t. Thus,
we can give value/numbers to it i.e. if black occurs, the experiment is successful and
we give it a number 1 and if it doesn’t occur, we give it a number 0. Thus, the
sample space becomes S = {0,1}. Thus, n(S) = 2
If we roll two successive die which are white and black in colour, the sample space
becomes, S = {(W,B), (W,W), (B,W), (W,W)}. Thus, n(S) = 4
Similarly, if we toss an unbiased coin, the sample space S = {head, tail}. If we toss
two unbiased coins together, the sample space (Nn = 22 = 4) becomes,
S = {(H,T), (H,H), (T,H), (T,T)}. Thus, n(S) = 4

3.2 Events
An event is that element of all possible outcomes of a sample space which

satisfies a given criteria in a random experimentation.
For example, if we throw two die, the sample space is:
S = {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6)
(2,1), (2,2), (2,3), (2,4), (2,5), (2,6)
(3,1), (3,2), (3,3), (3,4), (3,5), (3,6)
(4,1), (4,2), (4,3), (4,4), (4,5), (4,6)
(5,1), (5,2), (5,3), (5,4), (5,5), (5,6)
(6,1), (6,2), (6,3), (6,4), (6,5), (6,6)
Now, if we want to find out the event of getting a multiple of 3. Thus, event of
getting a multiple of 3 in a roll of two die is:
E3 = {(1,2), (1,5), (2,1), (2,4), (3,3), (4,2), (5,1)}
Thus, n(E3) = 7
Thus, we can say that an event ‘E’ is a subset of a sample space ‘S’ i.e. E S.
If we want to find the event of the sum of 3 in a roll of two die, then the event is:
E3 = {(1,2), (2,1)}. Thus, n(E3) = 2
Suppose now we toss three unbiased coins, the sample space is (Nn = 23=8):
S = {(HHH), (HTT), (HTH), (HHT), (TTT), (THH), (THT), (TTH)}, n(S) = 8
Now, if we want to find out the event of getting all tails,
ET = {(TTT)}, n(ET) = 1
If we want to find out the event of ‘No Tail’, thus

ENT = {(HHH)}, n(ENT) = 1
CERTAIN EVENTS: if the occurrence of an event is probable in every trial, we call

it a certain event. For example, the event of ‘a head or a tail’ is certain in every
trial of toss of an unbiased coin. But, the event of ‘head’ is not certain in every
trial of a toss of an unbiased coin. However, if the coin is two-headed, the event
of ‘head’ is a certain event in every trial of the toss of the coin.
IMPOSSIBLE EVENTS: However, event of a ‘tail’ in the toss of a two-headed coin

is impossible. The sample space of a toss of a two-headed coin will be, S =
{H,H}. Thus the event of a tail is null or void, ET = { }/ {0}. Thus, events with
empty or null set are known as impossible events.
COMPLEMENT: We have already studied that an event ‘E’ is a subset of a sample

space ‘S’. Thus, we may divide the elements of the sample space in two groups
i.e. elements which belong to E and elements which do not belong to E. The latter
group of elements are known as complement of the event E. Now, if we again
take the example of the toss of three unbiased coins. The sample space is,
S = {(HHH), (HTT), (HTH), (HHT), (TTT), (THH), (THT), (TTH)}
If we want to find out the event of occurrence of at least two heads,
E2H = {(HHT), (THH), (HTH), (HHH)}
Thus, the complement of E2H is = {(HTT), (TTH), (THT),
(TTT)}
3.3Mutually Exclusive Events
If two or more events cannot occur simultaneously or together, we call them

as mutually exclusive events. For any two mutually exclusive events A and B,
if event A occurs, then event B cannot and vice-versa. We add the
probabilities of mutually exclusive events while calculating their probabilities
i.e. if A and B have no elements in common, then P(AUB)/ P(A or B) = P(A) +
P(B). The two events “I had my lunch today” and “I did not have my lunch
today” are two mutually exclusive events.
For example, if we throw a cubic die, occurrence of 1 or 6 is mutually
exclusive i.e. in a roll of a die, if 1 occurs, 6 cannot occur simultaneously and
vice-versa, or occurrence of event ‘head’ and event ‘tail’ is mutually exclusive
in the throw of an unbiased coin. Now, suppose we roll a die and define two
events A and B such that,
Event A = all even numbers

Event B = all odd numbers
Thus, there are no common element between A and B (seen by the separated
circles). Thus, they are mutually exclusive events.
3.4Mutually Exhaustive Events
If and only if at least one among the several events necessarily occurs, we call
them to be mutually exhaustive. For example, if Congress and BJP are the two
parties fighting for 2014 general elections, then one of them must win the
election and thus they are mutually exhaustive. Similarly, if we roll a six-faced
die, the occurrence of 1, 2, 3, 4, 5 and 6 are mutually exhaustive. If Mr. A plays
chess with his friend, Mr. A either wins or loses. Thus the events win and lose are
mutually exhaustive.
3.5Equally Likely Events
In random experimentation, equally likely events are those if occurrence of any

of the events cannot be expected in preference of the other. For example, if an
unbiased coin is tossed, the occurrence of the event head or tail is equally likely.
3.6Independent Events
Two events, A and B, are said to be independent if the occurrence of event A has
no effect on the occurrence of the event B. For example, “I had my lunch today”
and “My pen got stolen in the college today” are independent events. We multiply
the probabilities of independent events while calculating their probabilities. By
doing this, we want to find out the occurrence of both the event together,
provided they are not related, i.e. P(A and B)/ P(A∩B) = P(A) * P(B).
Q- Do you think if event are mutually exclusive then they are also independent, and
vice-versa?
Ans- No. By definition, two mutually exclusive events i.e. P(A∩B=0) cannot be
independent i.e. P(A∩B) = P(A) * P(B).
NOTE: sometimes the two events are such that their probability of occurring
together or not becomes unclear, For example, if we look at stock prices, we know
that, for example, reliance and tata are big companies with large capital and thus

their stock prices are highly valued. It may be expected that investing in such
companies requires big investment which would bring higher return, but the stock
prices fluctuate on account of other factors too (like, political uncertainity,
recession, war, natural calamity, etc.). Thus, the stock prices may rise or fall not
depending on its underlying value.
** NOTE: If two events A and B are dependent on each other (i.e. they are neither
independent nor mutually exclusive), then we use conditional probability, which we
will study in the next chapter.
4. Definitions of Probability
Probability of an Event: We have already studied about sample space and events. Now,
suppose we want to assign some numerical values to all the possible outcomes of a random
experiment to find its probability of occurrence in every trial.
There are four different ways in which the term ‘probability’ has been defined:
1) Classical definition
2) Axiomatic definition
3) Empirical definition
4) Subjective definition
4.1) Classical definition:
According to the classical definition of probability, if N is the number of outcomes (in an

experiment), which are mutually exclusive, mutually exhaustive and equally likely, and NE is
the number of outcomes favorable to an event E, then the probability of an event E is
defined as:
P (E) = NE/N
i.e. the probability of an event E is the ratio of number of favorable outcomes to event E
(n(E)) to the total number of outcomes (n(S)) i.e. number of favorable outcomes/total
number of outcomes
For Example, if three unbiased coins are tossed simultaneously and we have to find out the
probability of the event ‘no head’. There are 23 = 8 possible outcomes (n(S)). If E is the
possibility of ‘no head’, then the events favorable to E = {TTT} and thus n(E) = 1.
Thus, P (E) = n(E)/n(S) = 1/8
If n(E) = 0 i.e. there are no favorable events outcomes to event E, then P (E) also becomes
zero and thus we can say that event E is impossible.
Disadvantages of classical definition:

1) Classical definition can only be applied if and only if the events are mutually exclusive,
mutually exhaustive and equally likely. Else this definition cannot be applied.
2) The phrase ‘equally likely’ in the definition means equally probable. So we may say that
the definition is circular in nature.
3) The definition fails if the number of outcomes of an event is very large (infinite).
4.2) Axiomatic definition:
Axiomatic approach requires two conditions to be fulfilled to assign a probability value to an

event E. Following are the assumptions:
1) A probability value (any real number) must be assigned to all the possible outcomes
(event E) in the sample space (S). This probability value may be represented as P(E). It
shows that this is a function of the event E which has been assigned a probability value.
2) The assigned values of probability must satisfy the three axioms or postulates of
probability:
i. AXIOM 1: For any event A, its probability will be a non-negative number, i.e. P(A) ≥
0 or 0 P(A) 1
ii. AXIOM 2: For a certain event in sample space S, probability of the sample space i.e.
P(S) =1
iii. AXIOM 3: For any two mutually exclusive events A and B i.e. if A and B have no
elements in common i.e. P(A∩B=0), then P(AUB) or P(A or B) = P(A) + P(B)
4.3) Empirical definition:
Von Mises was the one to introduce this concept. This definition uses the basis of deductive
theory which is not widely accepted. According to this definition, if an event (E) is found to
occur m times out of N trials of a random experiment, its relative frequency is given by m/
N. As N increases indefinitely, this relative frequency approaches a limiting value P, and is
called the probability of the event E.
P (E) = lim (m/N)

N→∞
This limit gets a meaning if we take this equation as an assumption to define P (E).
4.4) Subjective definition:
We assign a statement to the value of number P (E) in the subjective definition of

probability. Event E can be defined as our belief or knowledge that is concerned with the
probability or truthfulness of E. We use such probabilities in our day to day life and
conversations. We often make statements like “I am 100% sure that it will rain today” i.e. P
(raining today) = 1 (100%) or “there is 50% chance that I will catch the bus” i.e. P
(catching the bus) = ½ or 0.5 (50%).

5. Examples of probability
Example 1) Suppose we want to find the probability of numbers 1 and 6 in any order, in
the throw of two die.
- Solution: Let the event number 1= A,

Thus, probability of A = P (A) = 1/6
Let the event number 6 = B,
Thus, probability of B = P (B) = 1/6
In the throw of the first die, if 1 falls out to be the uppermost number, it rules out the
possibility of number 6 to be the uppermost i.e. they can’t occur together. Thus A and B are
mutually exclusive. Thus, either A falls to be the uppermost, or B, they can’t occur together.
Thus, in the first die, P (A or B) = P(A) + P(B) = 1/6 + 1/6 = 2/6 = 1/3. Thus probability of
first die (P(F)) = 1/3.
Now, as we throw the second die, it will be a specific number. i.e. if number 1 occurred in
the first die, the second die has to be 6 (P(B) = 1/6) and vice-versa (P(A) = 1/6). Thus, the
probability of second die (P(S)) being favourable is still 1/6.
Thus, we can say that first and second die are independent as occurrence of A or B in the
first die has no affect on the occurrence of A or B in the second die.
Thus, the probability of numbers 1 and 6 in any order = P(F) * P(S) = 1/3 * 1/6 = 1/18
Example 2) Find the probability of getting a double six in the throw of two die.
- Solution: In the first die, P(of getting 6) = 1/6

In the second die, P(of getting 6) = 1/6
Also we can say that first and second die are independent of each other as occurrence of 6
in the first die has no affect on the occurrence of 6 in the second die. Thus,
Probability of getting a double six in the throw of two die = 1/6 * 1/6 = 1/36
Example 3) What is the probability that all three siblings born in a family will have different
birthdays.
- Solution: Suppose the three children were born on three different days.
If the first child was born on one of the 365 days, the second child must be born on any
one of the remaining 364 days, and in the same way third child has to be born in any of the
remaining 363 days.
P(birthday of 1st child) = 365/365
P(birthday of 2nd child) = 364/365
P(birthday of 3rd child) = 363/365
All the three events are independent of each other.
Thus, probability that all three siblings born in a family will have different birthdays =
365*364*363/365*365*365
Example 4) If 5 students are sitting in a row at random, what is the probability that 4th and
5th students will sit together?
- Solution: Number of ways in which 5 students can sit = 5! = 5.4.3.2.1 = 120 ways

Suppose 4th and 5th students sit together, they can arrange among themselves in 4! 2! =
48 ways.
Therefore, the required probability is 48/120 = 0.4
Example 5) From a pack of well-shuffled cards, two cards are drawn at random. Find the
probability that
i) Both cards are black
ii) One is a spade and other is a heart
- Solution: Number of ways in which two cards can be drawn from a pack of 52 cards =
52
C2 = 1326 ways
These outcomes are equally likely, mutually exclusive and mutually exhaustive.
i) Total number of black cards = 26
Number of cases favorable to both the cards are black = 26C2 = 325
Thus, probability that both cards are black = 325/1326
52
ii) Number of ways in which two cards can be drawn from a pack of 52 cards = C2 = 1326
ways

There are 13 spade and 13 heart shaped cards.

13
Number of cases favorable to one card is spade and one is heart = C1.13C1 = 169 ways
Thus, probability of drawing one spade and one heart = 169/1326
Example 6) A bag contains 3 black balls and 7 white balls. If one ball is drawn at random,
find the probability that it will be black?
- Solution: Total number of balls = 10, number of ways in which one ball can be drawn
from them at random is given by 10C1 = 10 ways
A black ball is drawn in 3C1 = 3 ways
Thus, P(black ball) = 3/10 = 0.3
Example 7) What is the probability of the event ‘no head’ in the toss of two unbiased coin?
- Solution: Total number of outcomes = 4 {HH, TT, HT, TH}

In the toss two unbiased coins, P (at least one head) = ¾
Thus, probability of no head: P (no head) = 1 – P (at least one head) = ¼
Example 8) Two cards are drawn together from a pack of 52 well shuffled cards. Find the
probability that one is a club and one is a diamond, is:
52
- Solution: n(S) = C2 = (52*51/2*1) = 1326
Let E = event of getting 1 club and 1 diamond.
n(E)= number of ways of choosing 1 club out of 13 and 1 diamond out of 13

= ( 13C1 x 13C 1)
= (13 x 13)
= 169.
Thus, P(E) = 169/1326 = 13/102
Example 9) From a pack of 52 well shuffled cards, a card is drawn at random. Find the
probability that: i) the card drawn is a face card, ii) it is queen of spade or king of diamond.
- Solution: n(S) = 52
i) There are three face cards i.e. jack, queen and king of each suit. Thus there are 12 face
cards out of 52 {n(E)}.
Let E = event of getting a face card
Thus, P(E) = 12/52 = 3/13
ii) Let E = event of getting a queen of spade or a king of diamond.
Then, n(E) = 2.
Thus, P(E) = 2/52 = 1/26
Example 10) A bag contains 6 red and 8 green balls. One ball is drawn at random. What is
the probability that the ball drawn is red?

- Solution: Number of balls = (6 + 8) = 14.
Number of red balls = 8.
P(drawing a red ball) = 6/14 = 3/7
Example 11) A bag contains 4 white, 5 red and 6 blue balls. Three balls are drawn at
random from the bag. The probability that all of them are red, is:
- Solution: n(S)= number of ways of drawing 3 balls out of 15

= 15C3
=(15*14*13/ 3*2*1)
= 455.
Let E = event of getting all the 3 red balls.
n(E) = 5C3 =(5*4/2*1)= 10.
Thus, P(E) = 10/455 = 2/91
6. Properties of Probability
i) Suppose probability of an event A = P(A) and its complement = P( ).
We know that P(S) = 1, thus sum of probabilities of a probability and its complement is 1,
so we may say that probability of complement of event A is:
ii) The probability of a null or void or impossible event is zero i.e.

iii) If there are two events A and B, their union is the sum of their probabilities minus their
intersection i.e.
iv) If an event B is a subset of event A, then probability of event B will be less than or equal
to the probability of event A i.e.
v) If E1, E2, ………, En are mutually exclusive to each other, then
vi) If the sample space S is finite and it has events E 1, E2, ……, En such that S = { E1, E2,
………, En } then probability of S is:
For example, probability of getting an odd number in the throw of a die is:
P(odd) = P(1) + P(3) + P(5)

= 1/6 + 1/6 + 1/6
= 3/6 = 0.5
7. Summary
- Probability of an event E = P (E) = number of favorable outcomes/total number of

outcomes.
- All possible outcomes of a random experiment is known as sample space.
- All the subsets of a sample space are called events. Suppose there are n number of
events, thus S = { E1, E2 , ………, En} and probability of S =
- P(S) = 1
-
-
-
- If A and B are independent, P(A and B)/ P(A∩B) = P(A) * P(B)
- If E1, E2, ………, En are mutually exclusive events with each other,
- Three Axioms of probability are:

i. P(A) ≥ 0 or 0 P(A) 1

ii. For a certain event in sample space S, P(S) =1

iii. For any two mutually exclusive events A and B, P(AUB)/ P(A or B) = P(A) + P(B)
- If two or more events cannot occur simultaneously or together, we call them as mutually
exclusive events.
- Two events, A and B, are said to be independent if the occurrence of event A has no effect
on the occurrence of the event B.
8. Exercise
1) Define probability in all possible ways.

2) What is the difference between mutually exclusive and independent events?
3) What is the probability of getting at most two heads in the toss of two unbiased
coins.
4) What is the probability of getting a queen of heart from the pack of 52 cards.
5) There are 4 boxes numbered 1, 2, 3, and 4 in which 15 identical balls are distributed
at random. Find the probability that i) each box contains at least 2 balls and ii) no
box is empty. (Ans. i) 5/34, ii) 91/204)
6) Find the probability of a leap year having 53 Sundays or 53 Saturdays?
7) A bag contains 7 white and 2 blue balls and another bag contains 5 white and 4 black
balls. If two balls are drawn, one from each bag, find the probability that both balls
are black?
8) A box contains 6 defective articles and 10 non-defective ones. If one item is drawn
at random, find the probability that it is either non-defective or has a defect?
9) If the probability of student A failing in board exam is 0.2 and of student B failing is
0.3, then find the probability of either A failing, or B failing?
10) Two athletes X and Y take part in a race. If the chance of X winning the race is 1/16
and that of Y winning the same race is 1/8, what is the chance that neither X nor Y
wins the race?
11) The probability that at least one of the events A and B occurs is 0.6. If A and B occur
simultaneously with probabilities 0.2, then find P ( )+P( )
12) In a classroom, there are 6 girls and 12 boys. If one student is selected at random,
find the probability that the student is a girl.
i) 2/3
ii) 3/2
iii) 1/3
iv) 1/2
13) How many 3-digit codes using the digits 0 through 9 are possible if repetitions are
allowed?
i) 504
ii) 729
iii) 30
iv) 1000
14) A single card is drawn from a pack of well shuffled 52 cards. What is the probability
of getting a king or a queen?
i) 3/52
ii) 1/13
iii) 7/52
iv) 2/13
15) If a die is rolled one time, find the probability of getting a number greater than 0.
i) 0

ii) 1
iii)2/6
iv) 5/6
9. References
2. PH Karmel, M Polasek, Applied Statistics for Economists, 4th edition.
3. http://classof1.com/solution-library/view/math/probability/Multiple-choice-Questions-on-
theoretical-probability/621/probability/string/search
4. http://www.probabilitytheory.info/
10. Multiple Choice Questions (MCQs)
1. Tickets numbered 1 to 20 are mixed up and then a ticket is drawn at random. Find the
probability that the ticket drawn has a number which is a multiple of 3 or 5?
i) 1/3
ii) 3/5
iii) 8/9
iv) 9/20
Ans. iv) 9/20
Solution: Here, S = {1, 2, 3, 4, ...., 19, 20}.

Let A = event of getting a multiple of 3 or 5 = {3, 6, 9, 12, 15, 18, 5, 10, 20}
Thus, P(A) = n(A)/n(S) = 9/20
2. A bag contains 2 white, 3 black and 2 blue balls. Two balls are drawn at random. Find the
probability that none of the balls drawn is blue?
i) 10/21
ii) 2/21
iii) 2/7
iv) 5/21
Ans. i) 10/21
Solution: Total number of balls = (2 + 3 + 2) = 7.
n(S)= Number of ways of drawing 2 balls out of 7

= 7C 2

(where S is the sample space)

=(7*6/2*1)
= 21.
Let E = Event of drawing 2 balls, none of which is blue.
n(E)= Number of ways of drawing 2 balls out of (2 + 3) balls.

= 5C 2
=(5*4/2*1)
= 10.
P(E) = n(E)/n(S) = 10/21
3. In a box, there are 8 red, 7 blue and 6 green balls. One ball is picked up randomly. What
is the probability that it is neither red nor green?
i) 1/3
ii) ¾
iii) 7/19
iv) 8/21
Ans. i) = 1/3
Solution: Total number of balls = (8 + 7 + 6) = 21. Thus, n(S) = 21.
Let E= event that the ball drawn is neither red nor green
= event that the ball drawn is blue.
n(E) = 7.
P(E) = n(E)/n(S) = 7/21 = 1/3
4. What is the probability of getting a sum 7 from two throws of a dice?
i) 1/9
ii) 1/36
iii) 1/6
iv) 4/6
Ans. iii) 1/6
Solution: In two throws of a die, n(S) = (6 x 6) = 36.
Let E = event of getting a sum ={(3, 4), (4, 3), (5, 2), (6, 1), (2,5), (1,6)}.
Thus, P(E) = n(E)/n(S) = 6/36 = 1/6
5. Three unbiased coins are tossed. What is the probability of getting at least two heads?

i) 3/4
ii) 1/4
iii) 3/8
iv) 4/8
Ans. iv) 4/8
Solution: Here S = {TTT, TTH, THT, HTT, THH, HTH, HHT, HHH}
Let E = event of getting at least two heads.
Then E = {HHH, THH, HTH, HHT}.
Thus, p(E) = n(E)/n(S) = 4/8 = ½
6. In a simultaneous throw of tow die, what is the probability of getting two numbers whose
product is ‘not even’?
i) 1/4
ii) 3/4
iii) 3/8
iv) 5/16
Ans. i) 1/4
Solution: In a simultaneous throw of two dice, we have n(S) = (6 x 6) = 36.
Let E = numbers whose product is even, thus = numbers whose product is not even
Then, E= {(1, 2), (1, 4), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 2), (3, 4),
(3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 2), (5, 4), (5, 6), (6, 1),
(6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
n(E) = 27.
P(E) = n(E)/n(S) = 27/36
P( ) = 1 – P(E) = 1 – 27/36
= 9/36
=¼
7. In a society, there are 15 blinds and 10 deaf people. Three residents are selected at
random. The probability that 1 deaf and 2 blinds are selected, is:
i) 21/46
ii) 25/117
iii) 1/50
iv) 3/25

Ans. i) 21/46
Solution: Let S be the sample space and E be the event of selecting 1 deaf and 2 blinds.
Then, n(S)= Number ways of selecting 3 residents out of 25

= 25C3 `
(25 x 24 x 23)
=
(3 x 2 x 1)
= 2300.
n(E) =
= 1050.
Thus, P(E) = 1050/2300 = 21/46
8. In a lottery of 35 chits, there are 10 chits with prizes and 25 chits left blank. A lottery is
drawn at random. Find the probability of getting a prize?
i) 2/5
ii) 1/5
iii) 5/7
iv) 2/7
Ans. iv) 2/7
Solution: n(S) = 35
n(Prize) = 10
Thus, P(prize) = 10/35 = 2/7
9. Two cards are drawn together at random, from a pack of 52 well shuffled cards. Find the
probability that cards are queenss?
i) 1/15
ii) 1/663
iii) 1/221
iv) 1/26
Ans. iii) 1/221
Solution: Let S be the sample space.
52
Then, n(S) = C2 =(52*51/2*1)= 1326.
Let E = event of getting 2 queenss out of 4.
n(E) = 4C2 =(4*3/2*1)= 6.

Thus, P(E) = n(E)/n(S) = 6/1326 = 1/221
10. Two dice are tossed. The probability that the total score is ‘ not a prime number’ is:
i) 5/12
ii) 7/12
iii) 1/6
iv) 2/36
Ans. ii) 7/12
Solution: Here, n(S) = (6 x 6) = 36.
Let E = Event that the sum is a prime number, thus = event that the sum is not a prime
number.
Then E= { (1, 1), (1, 2), (1, 4), (1, 6), (2, 1), (2, 3), (2, 5), (3, 2), (3, 4), (4, 1), (4, 3),
(5, 2), (5, 6), (6, 1), (6, 5) }
n(E) = 15.
P(E) = 15/36
Thus, P( ) = 1 – 15/36
= 21/36
= 7/12

DC-1
SEM-II
Lesson:Conditional Probability
Lesson Developer:Neha Goel
College:Shyamlal College
University of Delhi
 2: Introduction
 3: Theorems Of Probability/Counting Techniques
 3.1: Theorem Of Total Probability
 3.2: Theorem Of Compound probability
 4: Conditional Probability and Concept Of Independence
 5: Examples of Conditional Probability
 6: Multi-Stage Experiment and Bayes’ Theorem
 7: Summary
 8: Exercises
 9: References
 10: MCQs
- Understand the concept of conditional probability and its application

- Explain the independence of events
- Apply various counting techniques like addition, multiplication, permutation and
combination to probability
- Understand the Bayes’ theorem and its application
2. Introduction
CONDITIONAL PROBABILITY:
It is defined as the probability of occurrence of an event A, given that another event B has
already occurred. We may represent the conditional probability of event A as:
P(A/B) = P(B∩A) / P(B) , provided P(B) ≠ 0, or

P(B∩A) = P(B) * P(A/B)
Similarly, P(A∩B) = P(A) * P(B/A) or

P(B/A) = P(A∩B) / P(A), provided P(A) ≠ 0
For example, if a bag contains W white balls and B blue balls, and two balls are drawn at
random (without replacement). The probability of getting a white ball is

. Suppose in the first turn, a white
ball was obtained then the conditional probability of getting a white ball in the second turn is:
Similarly, if the ball obtained in the first draw was blue, the conditional probability of getting
a white ball in the second turn is: W/(W+B-1)
Now, Let A denote a white ball in first draw, and B denote a white ball in second draw, then
P(A) = and
P(A∩B) given A
= ,
Now, the conditional probability of B =
P(B/A) = P(A∩B) * P(A)
=
*
We have already studied that, P(A∩B) = P(A/B) / P(B)

Similarly, P(AC∩B) = P(AC/B) / P(B), where AC = complement of A
From the venn diagrams we can say,
3. Theorems Of Probability/ Counting Techniques
3.1 Theorem Of Total Probability/ Addition Theorem:
According to this theorem, for any two mutually exclusive i.e. P(A∩B = 0), exhaustive and
equally likely events A and B, we add the probabilities of the two events while calculating
the probabilities of the occurrence of either A or B i.e. if A and B have no elements in
common, then
P(AUB) or P(A or B) = P(A) + P(B)

This is also known as the addition rule.
For example, a class has 15 girls and 20 boys, thus total students = P(GUB) = P(G) + P(B)
= 35
If the two events are not mutually exclusive i.e. P(A∩B=0), then the addition rule becomes:
P(AUB) = P(A) + P(B) - P(A∩B) or
P(A or B) = P(A) + P(B) - P(A and B) or
P(A + B) = P(A) + P(B) - P(A X B)
i) Theorem of complementary event:

Let A= occurrence of an event A and is its complement, thus
P(A) = 1 – P( ), since we know that P(S) = P(A) + P( )=1
ii) Extension of Total Probability theorem:

If E1, E2, ………, En are mutually exclusive to each other, then
iii) Theorem of total probability with mutually Non-Exclusive Events:
For three event A, B and C,

P(AUBUC) = P(A) + P(B) + P(C) – P(A∩B) – P(A∩C) – P(B∩C) + P(A∩B∩C)
3.2 Theorem Of Compound Probability/Multiplication Theorem
According to this theorem, if we want to find the probability of two event A and B occurring
together simultaneously, we can multiply the probability of the event A and the conditional
probability of event B given that event A has actually occurred, denoted by P(B/A)
P(B/A) is the ratio of number of events favorable to events A and B to the number of events
favorable to event A.
i.e. P(B/A) = P(A∩B) / P(A) or
P(A∩B) = P(A) * P(B/A)
This is also known as the multiplication rule.
i) Extension of Compound Probability Theorem:

Suppose there are three events A, B and C, then
P(A∩B∩C) = P(A) * P(B/A) * P(A∩B)
For example, if P(AUB) = 6/8, P(A) = 6/16, P(B) = 2/4. Find P(B/A) and P(A/B).
P(A∩B) = P(A) + P(B) – P(AUB) = 6/16 + 2/4 – 6/8 = 1/8
P(B/A) = P(A∩B) / P(A) = 1/8 / 6/16 = 1/3
P(A/B) = P(A∩B) / P(B) = 1/8 / 2/4 = ¼
For example, following results were obtained for three subjects namely Mathematics (M),
Economics (E) and Hindi (H) in a class:
25% of the students passed in M
20% of the students passed in E
35% of the students passed in H
7% of the students passed in M and E
5% of the students passed in M and H
2% of the students passed in E and H
1% of the students passed in all the three subjects
Find the probability that a student got passing marks in at least one of the subject:
P(M) = .25, P(E) = .20, P(H) = .35, P(M∩E) = .07, P(M∩H) = .05, P(E∩H) = .02, P(M∩E∩H)
= 0.1.
Therefore, P(MUEUH) = P(M) + P(E) + P(H) – P(M∩H) – P(M∩E) – P(E∩H) + P(M∩E∩H) =
.25 + .20 + .35 - .07 - .05 - .02 + .01 = .67
The Fundamental Principle of Counting
If we have to make a choice among various ‘n’ number of decisions/options that has to be
made, then total number of choices is given by:
C = c1 × c2 × c3 × . . . × cn
Where c1 = the number of ways to choose the1st option,
c2 = the number of ways to choose the2nd option, etc.
Permutation Rule:
If we want to place/draw E elements (out of n elements) in some sequential order, without
replacement, we use the rule of permutation. Thus, permutation of E elements out of n, is
defined as the number of ways to order E elements from a set of n elements, without
repetition (E< n). Order is what matters in permutation and objects are drawn without
replacement and it is denoted as nPE.
Explanation: Suppose the first object is placed in n ways, since this object won’t be
replaced, the second object can be placed in (n – 1) ways and the process continues till the
last object is placed.
n =
PE =
For example, there are 7 digits numbered 1,2,3…7. Suppose we have to make a number
plate of 4 digits, without replacement. Find the probability that the number plate formed is
an even number.
Solution: For the number to be even, the last number should be an even number.
Remaining three digits can be any number from the remaining 6.
Total number of ways favorable =
Total number of even numbers = 3 i.e. {2,4,6}

The last digit can be arranged in 3 ways. Since we don’t replace the numbers, assuming
that the last digit is an even number, the rest of the 3 digits can be arranged in 6P3 ways.
Thus, P(Even) = (3 * 6P3) /( )= = 3/7
For Example, Suppose 3 boys and 5 girls sit together in a row in a class. Find the
probability that: i) all boys sit together, ii) same gender sit in the extreme ends.
Total number of ways favorable = = 56 ways

Solution:
i) Assume BBB as 1 object and we arrange 6 objects i.e. 5 girls and 1 BBB. Thus number of arrangements = =6
arrangements. Thus,
Probability that boys sit together = 6/56 = 3/28
ii) There can be two ways i.e. one where boys sit at extreme ways and another way in which girls sit at extreme ends.
If girls are at each end, number of distinct arrangements of 6 remaining places in which
remaining 3 girls and 3 boys adjust = = 20
If boys are at each end, number of distinct arrangements of 6 remaining places in which
remaining 5 girls and 1 boy adjust = =6

Thus, the same gender sit in the extreme ends = = 13/28
Combination Rule:
If we want to choose/draw E elements (out of n elements), without replacement, we use the
rule of combination. Thus, combination of E elements out of n, is defined as the number of
ways to draw/choose E elements from a set of n elements, without repetition (E< n). Here
order does not matter and it is denoted as nCE or .
Explanation: Suppose the first object is placed in n ways, since this object won’t be
replaced, the second object can be placed in (n – 1) ways and the process continues till the
last object is placed.
For Example: Suppose you pick up 13 cards out of a pack of 52 well-shuffled cards. Find
the probability that i) it has at least one king, ii) it has 3 queens, iii) 6 clubs, 4 diamonds, 2
hearts and 1 spade.
Solution: Since the order does not matter, we may randomly choose 13 cards out of 52
applying combination rule and thus our sample size becomes 52C13
i) We know that there are four kings in a pack of 52 cards, so at least one king
means we can choose either 1 king and 12 other cards [(4C1) (48C 12)], or 2 kings
and 11 other cards[(4C2) ( 48C11)], or 3 kings and 10 other cards[(4C3) (48C 10)].
Thus,
P(At least one king) =
ii) We know that there are three queens in a pack of 52 cards, thus 3 queens means
we can choose 3 queens out of 4 in 4C3 ways and other 10 cards in 48C10 ways.
Thus,
P(3 queens) = ( 4C3 * 48C10) / 52C 13
iii) We know there are 13 cards of each suit i.e. spade, diamond, heart and club. We
choose 6 clubs in 13C6 ways, 4 diamonds in 13C4 ways, 2 hearts in 13C2 ways and 1
spade in 13C1 ways. These 4 cards can again be arranged in 4! Ways. Thus,
P(6 clubs, 4 diamonds, 2 hearts and 1 spade) =

4. Conditional Probability and Concept Of Independence
From the compound probability theorem we know that,
P(B/A) = P(A∩B) / P(A) or

P(A∩B) = P(A) * P(B/A)
Example: If you draw a card from a pack of 52 well-shuffled cards, find out the probability
that it is a king of hearts, given that the card drawn in red.
Solution: Since the card drawn is red, it could be of hearts or diamonds.

Let A = event that the card is red. Thus, P(A) = 26/52
Let B = event that the card is king of heart.
Thus P(A∩B) = 1/52
P(B/A) = P(A∩B) / P(A) = (1/52) / (26/52) = 1/26
Independent Events
We have already learnt about independent events in our last chapter i.e. if occurrence of
event A does not affect the occurrence of another event B. This implies that,
P(B/A) = P(B/AC) = P(B)
From the compound probability theorem we get,
P(A∩B) = P(A) * P(B/A)
= P(A) * P(B)
Similarly, for three independent events A, B, and C,
P(A∩B∩C) = P(A) * P(B) * P(C) and so on.
5. Examples
Example 1: One ticket is drawn at random from 100 tickets numbered 0, 1, 2, 3, …, 99. If
sum of the two digits on the ticket is i, such that, 0 ≤ i ≤ 18. Let Ei be the event that the
sum of the two digit number is i. Let F j be the event that the product of the two digits is j,
given 0 ≤ j ≤ 9. For each possible i, find P(Ei/F 0) .
Solution: F0 is the first event out of ten tickets drawn. Thus P(F0) = 1/10
(Ei∩F0) = event that the ticket numbered 0 is drawn. Thus, P(Ei∩F0) = 1/100
Thus, P(Ei/F0) = P(Ei∩F0) / P(F0)
Example 2: Suppose a card is drawn from a pack of 52 well-shuffled cards. Find the
probability that the card is a black Ace given it is a spade.
Solution: Number of black aces in a deck = 2, P(BA) = 2/52

Number of spades = 13, P(S) = 13/52
P(BA∩S) = 1/52
P(BA/S) = P(BA∩S) / P(S) = 1/13
Example 3: Suppose there is a Apple i-phone wholesaler who has 20% of the phones
duplicate in his showroom. Suppose retailer buys i-phone from him. He has 10% probability
of buying a duplicate i-phone. Find out the conditional probability that the retailer buys an
original i-phone.
Solution: Let B = event that the retailer buys the phone

O = event that the painting is original
Given, P(O) = 0.8 and P(B/Oc) = 0.1
Now assuming, P(B/O) = 1, we apply Bayes’ rule here
Thus, P(O/B) = = 0.8 / (0.8 + .02) = .80 / .82 = 40/41
Example 4: A manufacturer makes light bulbs and found that 5% of the bulbs have a
common defect. Researchers studied that 93% of these defective bulbs show a certain
behavioral characteristic, while this characteristic was exhibited in 2% of the non-defective
bulbs. A bulb was examined which showed a characteristic symptom. Given this behavioral
symptom, find the conditional probability that the bulb has a defect.
Solution: Let A = event that the bulb is defective
B = event that the bulb has a characteristic symptom
Given, P(A) = 0.05, P(B/A) = 0.93, and P(B/Ac) = 0.02
Thus, P(A/B) =
= (0.93 * 0.05) /[(0.93 * 0.05) + (0.02 * 0.95)
= 93 / 131
Example 5: Suppose according to a survey, life expectancy of women in USA is 70 years

with a probability of 0.70 and is 80 years with a probability of 0.55. Suppose a woman in
USA is 70 year old, what is the probability that she will survive till 80 years? (Note if A B,
then P(AB) = P(A))
Solution: Let A= event that the Woman lives till seventy years
B = event that the woman lives till eighty years
If the woman lives for eighty years, she would have already lived for 70 yrs, thus, B A.
Thus, P(B/A) = P(AB) / P(A) = P(B) / P(A) = .55 / .70 = 55/70
Example 6: Suppose a wholesaler receives a shipment of 1000 light bulbs. There is an

equally likely probability that there are 0, 1, 2, or 3 defective units in the lot. Find the
probability that ‘no defective’ light bulb unit is selected from the lot if one light bulb is
selected at random.
Solution: Let G = event that the light bulb is non-defective / good

Dk = Event that there are k number of defective light bulbs
P(D0) = P(D1) = P(D2) = P(D3) = ¼
P(G|D0) = 1000/1000 = 1
P(G|D1) = 999/1000
P(G|D2) = 998/1000
P(G|D3) = 997/1000
P(D0|G) =
=1⋅1/4 / [(1/4) (1+999/1000+998/1000+997/1000)] = 1000 / 3994
Example 7: In a survey, 85% students say that they obey the rules in the school. Previous
experience show that 20% of students who do not obey the rules, say that they obey, out of
fear of parents. If a student is picked at random, find the probability that he does obey the
rules in the school. (Assume: all who obeys rules says that they obey).
Solution: P(say) = 0.85. P(say/don’t obey) = 0.20, we assumed P(say/obey) = 1

P(say) = P(say/obey) + P(say/don’t obey) [1 – P(obey)]
Thus, P(obey) = [P(say) - P(say/don’t obey)] / [1 - P(say/don’t obey)]
= [0.85 – 0.20] / [1 – 0.20]
= 13 / 16
6. Multi-Stage Experiment and Bayes’ Theorem
It is not necessary for an event to be single-stage, if it can be broken down into stages, an
experiment can become a multi-stage.
Following results may take place when we apply conditional probabilities to events A and B
in case of a multi-stage experiment:
- I) Event A and event B may remain in the same stage and not enter another stage of
the experiment. Here we find the conditional probability of A given B using:
P(A/B)=P(A∩B) / P(B)
- II) Event A is still in the first stage that has already occurred whereas event B is in
the next stage that is yet to occur i.e. experiment is not yet complete.
- III) Event A is in the previous stage whereas event B will occur in a later stage and
the experiment is still incomplete
We use the Bayes’ theorem/rule/network in the IInd and IIIrd results.
Bayes’ Theorem
Suppose an event B occurs n times and all are mutually exclusive to each other. Thus, Bi’s
covers the entire sample space. Now, let us assume an event A which may occur if and only
if one of the events B1, B2, B3, ….., Bn occurs. This implies that if the unconditional
probabilities i.e. P(B1), P(B2), P(B3),….., P(Bn) are known, then the conditional probabilities
i.e. P(A/B1), P(A/B2), ….., P(A/Bn) will also be known.
P(A) = P(A∩Bi) = P(Bi) . P(A/Bi)
Now taking event A to be given, we can find out the conditional probabilities of events B1,
B2, B3, ….., Bn . Thus, given that A has actually occurred, the conditional probability P(Bi/A)
can be calculated as:
P(Bi/A) = P(Bi∩A) / P(A) = P(A/Bi) . P(Bi) / P(Bi) . P(A/Bi), therefore
This is known as Bayes’ Theorem.
For example,
First urn contains 2 white and 4 blue marbles and second urn contains 2 white and 2 blue
marbles. A marble is transferred from urn 2 to urn 1 and then a marble is picked from urn 1
randomly. If it turns out to be a blue marble, what is the probability that the transferred
marble was white?
Solution: Let B1 = transferred marble was white, B2 = transferred marble was blue
Let A = marble drawn from urn 2 is blue
P(B1) = ½, P(B2) = ½,
P(A/B1) = 3/7, P(A/B2) = 5/7
P(B1/A) =
= (½*3/7) / (½*3/7 + ½*5/7) = 3/8
Or, , where Ac = complement of A.

For Example: An entrepreneur expects his profits to rise over the next four quarters
with probability 0.4. As a result, a plan to increase the plant size over the next 12
months was prepared. When profits were analysed for past years data, in 8 out of ten
cases, profit occurred. Thus there was profit predicted in 2 out of ten cases but loss
occurred. Based on this information, how should the entrepreneur revise his plans of the
probability that profit will occur?
Solution: The probabilities as per entrepreneur’s expectation:

P(Profit) = 0.4 (profit will occur)
P( ) = 0.6, where = ‘no profit’
Let A be the analysis of the past years profits. The conditional probabilities of this
analysis is given by,
P(A/Profit) = 0.8
P(A/ ) = 0.2
By Bayes’ formula, the revised probabilities associated with profit/ are computed
as:
= .32/.44 ≈ 0.73
7. Summary
- If two events A and B are not independent, we use conditional probability which is defined
as the probability of occurrence of an event A, given that another event B has already
occurred
- Conditional probability of A, given B is P(A/B) = P(B∩A) / P(B) , provided P(B) ≠ 0
- Addition rule says, P(AUB)/ P(A or B) = P(A) + P(B)
- Multiplication rule says, P(A∩B) = P(A) * P(B/A)
- According to permutation rule, nPE = =
=
- According to combination rule,
- According to Bayes’ rule,

8. Exercise
1. Define conditional probability and its applications

2. What are the counting techniques in probability?
3. What are the theorems of probability?
4. Define addition rule and multiplication rule with examples.
5. Describe and explain Bayes’ theorem with example.
6. There are three workers in an industry a, b, and c making shirts. They make 25%, 35%
and 40% of the total shirts produced respectively. Out of the total shirts made by them,
5%, 4% and 2% respectively was found to be defective. If we select a shirt at random, find
the probability that worker C has made it.
7. Suppose there are two coins. One is a fair coin with probability of head = ½ and the
second one is loaded with head i.e. probability of head in the second coin is 2/3. Suppose
we toss a coin picking one of the two coins at random and a head turns up, find the
probability that it is a fair coin.
8. A has 50% chance of selling goods in his shop. If two customers enter his shop, find the
probability that A will sell the goods.
9. If events A and B are mutually exclusive, prove that P(A/AUB) = P(A)/P(A) + P(B).
10. There are two bags containing white and black balls. Suppose bag 1 contains 2 white
and 2 black balls and bag 2 contains 2 white and 4 black balls. If one ball is randomly
selected, find the probability that theyb will be of the same color.
9. References
2. PH Karmel, M Polasek, Applied Statistics for Economists, 4th edition.
10. MCQs
1. Suppose a medical company needs to find out if a certain drug can or cannot lead to an
improvement in symptoms for some patient with a particular medical condition. A study has
been done and following results were seen:
Improvement No Improvement Total

Drug 270 530 800
No Drug 120 280 400
Total 390 810 1200
On the basis of the above table, given that the drug was provided, find out the conditional
probability that the patient shows improvement.
i) .3375
ii) .325
iii .225
iv) .275
Ans: i) .3375
Hint: Let I = event that there is improvement
D = event that the patient took the drug
We need to find out P(I/D) = P(I∩D) / P(D)
= 270 / 800
= .3375 (Ans.)
2. Taking in reference the study of the table provided in question 1, find the conditional
probability that the patient was given the drug, given that the patient shows improvement.
i) .225
ii) .692
iii) .667
iv) .665
Ans. ii) .692
Hint: : Let I = event that there is improvement
We need to find out P(D/I) = P(D∩I) / P(I)
= 270 / 390
= .692
3. Suppose two cards are drawn without replacement from a deck of 52 cards. Find the
probability that both the cards are aces.
i) .0045
ii) .0050
iii) .0065
iv) .0385
Ans. i) .0045
Hint: Probability that the first card is ace = 4/52
Now, assuming that first card drawn is an ace and it is not replaced, the probability that
second card is also an ace = 3/51
Thus, probability that both cards are aces = 4/52 * 3/51 = .0045
4. Suppose two balls are drawn at random without replacement from an urn containing 4
red, 2 white and 3 green balls. Find the probability that the balls drawn are same in color.
i) .28
ii) .14
iii) .50
iv) .56
Ans. i) .28
Hint: Probability that two marbles are of same color = P(2R or 2W or 3G)
P(2R) = 4/9 * 3/8 (probability that first ball is red is 4/9 and assuming it to be true and
without replacing it, probability that second ball is red is 3/8)
Similarly, P(2W) = 2/9 * 1/8
And P(3G) = 3/9 * 2/8
Thus, P(2R or 2W or 3G) = 4/9*3/8 + 2/9*1/8 + 3/9*2/8 = 20/72 = .28
5. A bulb manufacturing company has three machines A, B and C. Machines A, B and C

produces 30%, 50% and 20% respectively of the total bulbs produced. Of their output, 1%,
4% and 3% respectively are defective. If one bulb is selected at random, find the
probability that it was produced by machine B and is also defective.
i) .40
ii) .04
iii) .02
iv) .20
Ans. iii) .02
Hint: Probability that machine B produces a bulb P(B) = .50
Probability that the bulb is defective and is produced by machine B = P(D/B) = .04
Probability that bulb was produced by machine B and is defectice = P(B∩D)
P(B∩D) = P(B ) * P(D/B)
= .50 * .04 = .02
6. A bulb manufacturing company has three machines A, B and C. Machines A, B and C

produces 30%, 50% and 20% respectively of the total bulbs produced. Of their output, 1%,
4% and 3% respectively are defective. If one bulb is selected at random, find the
probability that the bulb is defective.
i) .08
ii) .028
iii) .029
iv) .027
Ans. iii) .029
Hint: Probability that the bulb is produced by machine A and is defective = P(D/A) = .01
Probability that the bulb is produced by machine B and is defective = P D/B) = .04
Probability that the bulb is produced by machine c and is defective = P(D/C) = .03
Thus, probability that the bulb is defective is: P(D):
P(D) = P(A)*P(D/A) + P(B)*P(D/B) + P(C)*P(D/C)
= .3*.01 +.5*.04 + .2*.03 = .029 (Ans.)
7. Suppose we again go back to the study shown in the table provided in question 1: Can
you say that ‘taken drug’ and ‘improvement are independent events?
i) Yes they are independent

ii) No they are not independent
iii) Can’t say
Ans. ii) No
Hint: Let I = event that there is improvement
For independence, we need to prove P(D∩I) = P(D) * P(I)
P(D∩I) = 270/1200 = .225
P(D) * P(I) = 800/1200 * 390/1200 = .2167
Thus, P(D∩I) ≠ P(D) * P(I), so they are not independent.
8. Suppose you have 5 show pieces, and you want to arrange 3 of them in your show case.
In how many different ways can you arrange them?
i) 5C 3 = 10 ways
ii) 5P3 = 60 ways
iii) 3/5
iv) None
Ans. ii) 5P 3
Hint: This is the case of permutation. Since we want to arrange the show pieces and are
concerned with order, number of ways we can arrange the show pieces =
5
P3 = 5! / 2! = 60 ways
9. If a card is drawn from a deck of 52 well shuffled cards, what is the probability that it will
be a queen or a king?
i) 1/52
ii) 1/26
iii) 1/2
iv) 2/13
Ans.
Hint: Let A = event that card is a king, probability of a king = 1/52
Let B = event that card is a queen, probability of a queen = 1/52
Now, since both the events are independent i.e. P(A∩B) = 0
Thus, P(A or B) = P(AUB) = P(A) + P(B) = 2/52 = 1/26
10. In how many ways can we choose a sub-committee of 5 members from a club
consisting of 10 members?
10
i) P5 = 30240 ways
10
ii) C5 = 252 ways
iii) 5/10
iv) ½
10
Ans. ii) C5
Hint: This is an example of combination where we have to choose 4 members out of 10.
10
Thus, C5 = 252 ways
Discrete Random Variables And Probability Distributions
DC-1
Semester-II
Lesson: Discrete Random Variables And Probability
Distributions
Lesson Developer: Chandra Goswami

TABLE OF CONTENTS
Section Number and Heading Page Number
Learning Objectives 2
1. Random Experiments 2
2. Random Variables 4
3. Probability Distributions for Discrete Random Variables 7
4 Graphical Presentations of Probability Distributions 11
5. Parameters Of A Probability Distribution 12
6. The Cumulative Distribution Function 14
7. Deriving Probability Mass Function from Cumulative Distribution Function 16
Practice Questions 19
Content Developer
Chandra Goswami, Associate Professor, Department of Economics
Dyal Singh College, University of Delhi
Reference
Jay L. Devore: Probability and Statistics for Engineering and the Sciences, Cengage
Learning, 8th edition [Chapter 3]
.
DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
Learning Objectives:

In this chapter you will learn what is a random variable and the two fundamentally
different types of random variables. You will learn how to arrive at the probability
distributions of discrete random variables and how to represent these graphically, as
well as presentation by summary expressions. This provides the tool for evaluating the
probability that the random variable takes on specific values or a range of values. You
will also learn how the probability distribution can be used to specify a mathematical
model for the population distribution. This will help you to identify the characteristics of
the population. The chapter ends with practice questions so that you can test your
understanding of the chapter contents.
Chapter Outline
1. Random experiments
2. Random variables
3. Probability distributions for discrete random variables
4. Graphical presentations of probability distributions
5. Parameters of a probability distribution
6. The cumulative distribution function for discrete random variables
7. Deriving probability mass function from cumulative distribution function
1. RANDOM EXPERIMENTS
A random or chance experiment is an experiment which yields different possible
outcomes. These outcomes may be qualitative or quantitative. In case of qualitative
outcomes, we observe a specific attribute of the variable. Quantitative outcomes result
when we observe a number describing the attribute of the variable. Until the outcome is
observed there is uncertainty about which particular outcome will be the result of the
experiment. If the experiment is repeated under identical conditions different outcomes
are likely to be observed at each trial.
Example 1.1
If a balanced coin is tossed there are two equally possible (qualitative) outcomes, a head
(H) or a tail (T).

Example 1.2
It is known that wind speed and direction affects time taken by aircraft to reach their
destination. The three possible outcomes for arrival time on any day are: before time, on
time, or delayed.
Example 1.3
If an unbiased die is tossed it will result in one of six possible outcomes, depending on
which face shows up: 1, 2, 3, 4, 5, or 6.
Example 1 4
If example 1.2 is restated to measure the extent of time delay in the aircraft reaching its
destination, we can denote the possible outcomes as x = 0 for ontime arrival (ie, as per
the scheduled time), x < 0 as measure of before time arrival (eg, - 5 minutes indicates
arrival is 5 min ahead of scheduled time), and x > 0 for late arrival (eg, x = 22 represents
arrival is 22 min after the scheduled time). We obtain an infinite number of possible
outcomes since time is a continuous variable. Here extent of time delay (in minutes) is
the variable where x  0.
Example 1.5
A bottling plant fills cold drinks in 200 ml bottles for its client. Although the machine is
calibrated to dispense 200 ml per fill, it is noted that the fill amount varies from bottle to
bottle by small amounts. If we denote X = amount filled in a bottle (in ml), since volume
is a continuous variable, the possible values of the variable are x  200
The outcomes in examples 1.1 and 1.2 are qualitative, and quantitative in examples 1.3,
1.4 and 1.5. There are a finite number of outcomes in examples 1.1, 1.2 and 1.3, whereas
the number of outcomes is infinite in examples 1.4 and 1.5. In methods of statistical
analysis we often need some numerical aspects of experimental outcomes. The mean, for
instance, is a numerical function of the outcomes.
2. RANDOM VARIABLES
If the exhaustive set of all possible outcomes of a random experiment are known then
probabilities of occurrence of the different outcomes can be assigned. The concept of a

random variable allows us to obtain a numerical function of the experimental outcomes.

Some of the most commonly used numerical functions of experimental outcomes are the
mean, variance, and proportion.
Definition 1
For a given sample space S of some experiment, a random variable is any rule that
associates a number with each outcome in S
A random variable (rv) is thus a function defined over the elements of S. The domain of
the rv is the sample space and the range is a set of real numbers. A random variable is,
therefore, a variable that takes on numerical values determined by the outcome of a
random experiment. Thus, the value of the random variable will vary according to the
observed outcome of a random experiment. In general, random variables are functions
that associate numbers with some specific attribute of an experimental outcome. Random
variables will be denoted by uppercase letters, such as X and Y, and their values by the
corresponding lowercase letters, such as x and y.
Since the outcomes of a random experiment can be designated as a random variable, any
numerical function of the outcomes is also a random variable. It is random since its value
depends on which particular outcomes are observed. It is a variable since different
numerical values are possible. We can, therefore, assign probabilities to its possible
values. Therefore we can say that a random variable is a variable which can take one of
the different possible values in the sample space with an assigned probability. If X
denotes the rv and s the sample outcome, then X(s) = q where q is a real number.
Example 2.1
If X is a rv with m possible values x1, x2, x3,….xm and Y is a rv with n possible values
y1, y2,….yn then the linear function X + Y is also a random variable since x + y = xi + yj
where i = 1,2,….,m, and j = 1, 2,…..,n.
Exercise 1
A balanced coin and a fair die are tossed simultaneously. List the different possible
outcomes.

Solution:
Two possible outcomes of the coin are head (H) or tail (T). Six possible outcomes of the
die are 1, 2, 3, 4, 5, and 6. Since the coin and die are tossed simultaneously the possible
outcomes are as follows:
(H,1); (H,2); (H,3); (H,4); (H,5); (H,6); (T,1); (T,2); (T,3); (T,4); (T,5); (T,6)
Exercise 2
In Exercise 1, if H is denoted by 1 and tail by 0 so that x = 0, 1. and y = 1, 2, 3, 4, 5, 6
then list the different possible outcomes for the linear function X+Y
Solution:
x + y = 1, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 7 according to the combinations listed in exercise 1
Exercise 3
Assigning appropriate probabilities to the values of the random variables X and Y in
exercise 2, determine the probabilities of x + y.
Solution:
Since the coin is balanced, P(x=0) = p(0) = ½ and P(x=1) = p(1) = ½. Similarly, for the
fair die, p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = 1/6.
Since X and Y are independent, there are 12 possible equally likely outcomes. Therefore,
p(1) = p(7) = 1/12 and p(2) = p(3) = p(4) = p(5) = p(6) = 2/12
Definition 2
Any random variable whose only possible values are 0 and 1 is called a Bernoulli
random variable
If an unbiased coin is tossed repeatedly, on each toss there are only two possible
outcomes so it is a Bernoulli rv. If an experiment can result in only two possible
outcomes – success or failure – in each trial, we have a Bernoulli random variable.
There are fundamentally two different types of random variables: discrete random
variables and continuous random variables. The distinction between discrete and
continuous random variables lies in the number of possible values the rv can take. If the
rv can have a finite number or a countably infinite number of possible values it is a

discrete rv. If, on the other hand, the outcome can be any real number in a given interval,
the number of possibilities is uncountably infinite, and the rv is said to be continuous.
Definition 3
A discrete random variable is a rv whose possible values either constitute a finite set or
else can be listed in an infinite sequence which is “countably” infinite, where there is a
first element, a second element, a third element and so on.
Examples 1.1, 1.2 and 1.3 have possible values which constitute a finite set. So is the
case with exercise 1 and exercise.2. In all these cases the possible outcomes can be
counted.
Example 2.2
A new company wishes to establish its brand image. For this purpose it runs a series of
weekly newspaper advertisements until sales of its products reach the target level.
Reaching the level of target sales is considered a success. Success may be achieved in 1
week or 2 weeks or 3 weeks and so on. If we denote success by S and failure by F then
the sample space is S = [S, FS, FFS, FFFS,………..]. We can define the random variable
X = number of weeks before the advertising campaign ends. Then, X(S) = 1, X(FS) = 2,
X(FFS) = 3, X(FFFS) =4, and so on. Any positive integer is a possible value of X. Thus,
the set of possible values of the rv X is countably infinite.
Variables which require counting, such as number of successes in an experiment, number

of floors in a building, number of students in a class, shoe sizes, etc, are discrete.
Definition 4
A random variable is continuous if both the following conditions apply
1. Its set of possible values consists either of all numbers in an interval on the
number line or all numbers in a disjoint union of such intervals.
2. No possible value of the random variable has a positive probability.
Condition 1 implies that there is no way to create a listing of all the infinite number of
possible values of the variable. Condition 2 implies that intervals of values have positive

probability. As the width of the interval diminishes, probability of the interval decreases.
In the limit, probability of the interval is zero as the width of the interval reduces to zero.
Example 2.3
The university team is scheduled to visit any minute during a three hour long
examination starting at 9am. We may want to find the probability that the team visits at a
given time or we may be interested in the probability that the visit takes place during a
given time interval. The sample space is from 0 to 180 minutes. The probability that the
team visits during an interval of length c is c . This assignment of probabilities applies

180
only to intervals on the measurement axis from 0 to 180. The probability decreases as the
interval becomes shorter. For an interval of 5 seconds, the probability of a visit
is 5  0.0004629 . As the length of the interval approaches zero, the probability that the
10800
team will visit also approaches zero. That is why we always assign zero probability for a
single point on the number line. This does not mean that the team will not visit. The team
will visit at some point in the interval from 0 to 180 even though each point has zero
probability.
Variables such as time, height, distance, temperature, area, volume, weight, etc that
require measurement are continuous. In practice, however, limitations of measurement
instruments often do not allow measurement on a continuous scale. Yet it is useful to
study models of continuous variables as they often reflect real world situations.
3 PROBABILITY DISTRIBUTIONS FOR DISCRETE RANDOM

VARIABLES
A random experiment has different possible outcomes. It is certain that one of the
possible outcomes will be observed as a result of the experiment. Let the experiment
have only two possible outcomes which are mutually exclusive and exhaustive. Thus the
probability that either one or the other outcome will occur is the sum of the probabilities.
As there is no third alternative and the occurrence of one outcome precludes the
occurrence of the other, the sum of the probabilities adds up to one. The probability
distribution of the rv X tells us how the total probability of 1 is distributed among the

various possible values of the rv X. The probability assigned to any value x of the rv will
be denoted by p(x).
Definition 5
The probability distribution or probability mass function (pmf) of a discrete random
variable is defined for every number x by p(x) = P(X = x) for each x within the range of
X.
Based on the postulates of probability, a function can serve as the pmf of X if and only if
p(x) satisfies the following two conditions
1. 0 < p(x) < 1 for each value within its domain
2.  p(x) = 1 where the summation is over all values within its domain.
x
The first condition states that probability cannot be negative or exceed 1. The second
condition follows from the fact that all possible values of X are mutually exclusive and
collectively exhaustive so that the sum of the probabilities must equal 1. Thus, any
function which satisfies both properties can serve as the pmf of a discrete random
variable. Examples of pmf are Bernoulli Distribution, discrete Uniform Distribution,
Binomial Distribution, Negative Binomial Distribution, Hypergeometric Distribution and
Poisson Distribution.
Note that a function which satisfies the two conditions for one set of values of X may not
do so for another set of values. In the latter case the function cannot serve as a pmf of X.
To test whether a function is a pmf we need to check whether both conditions are
satisfied for the given X values.
Exercise 4
A balanced coin is tossed three times. Let X denote the rv that is defined as the total
number of heads. List the elements of the sample space and obtain the probability
distribution of the total number of heads observed. Find a formula for the pmf of the total
number of heads observed in three tosses of a fair coin.
Solution:
Denoting H = head and T = tail, elements of the sample space are
TTT, TTH, THT, HTT, THH, HTH, HHT, HHH.

Let the rv X = total number of heads observed in 3 tosses of a balanced coin. For a
balanced coin a head and a tail are equally likely outcomes so that P(H) = P(T) = ½. It
can be assumed that the outcome of any toss is independent of the outcomes of the other
two tosses of the coin. Then,
P(TTT) = P(X = 0) = p(0) = (1/2)(1/2)(1/2) = 1/8
P(TTH or THT or HTT) = P(X = 1) = p(1) = 3/8
P(THH or HHT or HTH) = P(X = 2) = p(2) = 3/8
P(HHH) = P(X = 3) = p(3) = 1/8
The probability distribution or pmf of X is given in the following table:
x 0 1 2 3
p(x) 1/8 3/8 3/8 1/8
The pmf can also be described as:

1 / 8 x  0 or 3

p( x)  3 / 8 x  1 or 2
 0
 otherwise
Both conditions for a pmf are satisfied since 0 < p(x) < 1 for x = 0, 1, 2 and 3, and
 p(x) = 1
x
Based on the probabilities we observe that numerators of the four fractions 1/8, 3/8, 3/8
and 1/8 are the binomial coefficients  3  ,  3 ,  3 ,  3 . The formula for the pmf can,
0      
  1  2  3
 3
 
therefore, be written as   for x = 0, 1, 2 and 3.
x
8
Exercise 5
A computer shop sells desktops, laptops, notebooks and tablets. .A prospective buyer
enters the shop. The random variable can take five possible values. X = 0 if no purchase
is made, X = 1 if a tablet is purchased, X = 2 if a notebook is purchased, X = 3 if a laptop
is bought, and X = 4 if a desktop is bought. If 40% of buyers purchase a tablet, 35%

buyers opt for a notebook, 20% a laptop and 5% a desktop, what is the probability
distribution of X?
Solution:
The pmf is as follows
x 0 1 2 3 4
p(x) 0 0.4 0.35 0.20 0.05
Exercise 6
A balanced coin is tossed four times. Use the formula derived in exercise 4 to obtain the
pmf of X = total number of heads in four tosses of the coin.
Solution:
Total number of possible outcomes is 24 = 16 as the result of each toss is independent of
 4
 
the remaining three tosses. Using the formula p(x) =   the pmf is as follows:
x
16
x 0 1 2 3 4
p(x) 1/16 4/16 6/16 4/16 1/16
Exercise 7
x4
Check whether the function given by f(x) = for x = 0, 1, 2, 3, 4 can serve as the
30
probability distribution of a discrete random variable.
Solution:
For given values of x the value of the function is as follows:
f(0) = 4/30, f(1) = 5/30, f(2) = 6/30, f(3) = 7/30, f(4) = 8/30
Each of the above values are positive fractions less than 1. Hence the first condition for a
pmf is satisfied. The sum of all the values of f(x), Σf(x) = (4 + 5 + 6 + 7 + 8)/30 = 1 so
that the second condition is also satisfied. Since both the required conditions for a pmf
are satisfied, therefore, the given function can serve as a pmf for a rv having the values 0,
1, 2, 3, and 4.

4 GRAPHICAL PRESENTATIONS OF PROBABILITY DISTRIBUTIONS

The pmf is positive for the countable number of values of the rv and zero for all other
values. We have  p(x) = 1. Thus, the pmf describes how the total probability mass of 1
x
is distributed at various points on the number line. The pmf can be presented graphically
in probability histograms.
For a probability histogram, above each x with P(x) > 0 construct a rectangle centered at
x. The height of each rectangle is proportional to P(x).The area of the rectangle equals
p(xi) for X = xi. If the base of each rectangle is of unit width then the height will be equal
to p(xi) for X = xi.
Example 4.1
The pmf of exercise 5 is
x 0 1 2 3 4
p(x) 0 0.40 0.35 0.20 0.05
For all x > 4 , p(x) = 0. The probability histogram is drawn by representing 1 with the
interval 0.5 to 1.5, 2 with the interval 1.5 to 2.5, 3 with the interval 2.5 to 3.5, and so on.
Figure 1 Probability histogram

The line graph and bar chart are also referred to as histograms. The line graph is drawn
by drawing lines of height p(x) for corresponding x values. The bar chart is drawn with
each rectangle centered at the x value with a height equal to the probability of the
corresponding value of the rv. The line graph and bar chart for the pmf of ex. 5 are
illustrated in Fig 2 and Fig 3 respectively.
Figure 2 Line graph Figure 3 Bar chart
5 PARAMETERS OF A PROBABILITY DISTRIBUTION

We can use the pmf to specify a mathematical model for a discrete population
distribution. If the population does not exist we can think of it as a mathematical model
for a conceptual population. Let the population consist of X values x1, x2, x3, ……… xn
with corresponding probabilities p(xi). From the relative frequency approach to
probability, we know that limit of the relative frequency is the probability of occurrence
of the X value in the population. As the population size tends to become infinitely large
the relative frequency approaches the probability, ie,
Lt
f i  px  , where f is the frequency of x and Σ f = n.
i i i i
n n
When all possible values xi of the rv X are considered then Σp(xi) =1 and we have the
probability distribution for the discrete population. The pmf thus provides a model for
the distribution of population values. Once we have such a population model we can use
it to calculate the values of population characteristics, like the mean μ, variance σ2, etc,
and make inferences about such characteristics.

Definition 6
Suppose p(x) depends on a quantity that can be assigned any one of a number of possible
values, with each different value determining a different probability distribution. Such a
quantity is called a parameter of the distribution.
The collection of all probability distributions for different values of the parameter is
called a family of probability distributions.
Example 5.1
We consider a random experiment that can give rise to just two possible mutually
exclusive and exhaustive outcomes 0 and 1. Then p(0) + p(1) = 1. Such a rv is called a
Bernoulli random variable. If we select α such that 0 < α < 1, the pmf of the Bernoulli rv
can then be expressed as
1   x0

p ( x)    x 1
 0
 otherwise
For each of the possible values of α in the interval between 0 and 1, we obtain a different
probability distribution. We thus obtain a family of Bernoulli distributions with each pmf
determined by a particular value of α. Since the pmf depends on the particular value of α
we often write the pmf of the Bernoulli distribution as p(x; α) rather than just p(x). The
quantity α in the Bernoulli pmf is a parameter. The value of the parameter α distinguishes
one Bernoulli distribution from another. If α can take any value in the interval 0 to 1, we
obtain an infinite number of Bernoulli distributions, each for a different value of α.
The value of the parameter may be unknown. If the population size is very large it may
not be possible to examine all the population values to ascertain the value of α. We can
then use sample data to infer about the parameter value α, where the sample is a
representative subset of the population.
Example 5.2
If the discrete rv X can take any value x1, x2, x3, ……… xn with equal probability we
have a discrete Uniform Distribution. We can denote the minimum value x1 = α, and the
maximum value xn = β. Then the pmf of the Uniform Distribution can be expressed as

1
  x
p( x)   n
0 otherwise

We obtain a family of uniform distributions with the pmf of each distribution determined
by a particular set of values for α and β. The pmf can be denoted by p(x; α, β), where α
and β are the parameters of the distribution. For different combinations of values of α
and β we obtain different uniform distributions.
6 THE CUMULATIVE DISTRIBUTION FUNCTION

It is often the case that we need to know the probability that the observed value of the rv
is less than or equal to a specific value x = a. For this we require the cumulative
distribution function (cdf) or, simply, the distribution function of the discrete rv X. It is
also called the cumulative mass function. The cdf is denoted by F(x).
Definition 7
The cumulative distribution function F(x) of a discrete random variable X with pmf p(x)
is defined for every number x by F(x) = P(X < x) =  p( y )
y: y x
Thus, cdf is obtained by summing the pmf p(x) over all possible values of X = y
satisfying y < x. We use F(x) to calculate the probability that the observed value of X
does not exceed x. It follows that P(X < x) < P(X < x) since the value x is included in
P(X < x) and not in P(X < x). Only if P(X = x) = 0 then P(X < x) = P(X < x). In all other
cases where P(X = x) > 0 the inequality holds, ie, P(X < x) < P(X < x).
The cumulative distribution function has the following properties:

1. 0 < F(xi) for every value X = xi
2. If a < b then F(a) < F(b) where a and b are two possible values of the rv X.
The first property states that F(x) is non-negative. F(x) = 0 for any value of X that is less
than the smallest permissible X value of the pmf since p(x) = 0 for all such values. It
follows that when all possible values of X have been considered F(x) = 1. For higher
values of X we again have p(x) = 0 so that F(x) remains unchanged at 1. The second
property implies that if p(b) = 0 then F(a) = F(b). Otherwise F(a) < F(b) when a < b.

The graph of F(x) is a step function. If X is a discrete rv whose set of possible values are
x1, x2, ……..., where x1 < x2 < x3 < ……….., the value of F(x) is constant in the interval
between two successive values xi-1 and xi, and then increases by p(xi) at xi. F(x) again
remains flat between xi and xi+1 when it jumps up (takes a step) by p(xi+1) at xi+1. This is
illustrated in Figure 4.
Figure 4 Graph of the cdf
Since F(xi-1) < F(xi) and F(xi) < F(xi+1), at all points of discontinuity the cdf takes on the
greater of the two values. This is indicated by heavy dots in Figure 4. It can be seen that
as x increases, the cdf will change values only at those points that can be taken by the rv
with positive probability.
Example 6.1
Using the pmf in exercise 5,
x 0 1 2 3 4
p(x) 0 0.40 0.35 0.20 0.05
F(0) = P(X = 0) = 0
F(1) = P(X= 0 or 1) = 0 + 0.4 = 0.4
F(2) = P(X= 0 or 1 or 2) = 0.4 + 0.35 = 0.75
F(3) = P(X= 0 or 1 or 2 or 3) = 0.75 + 0.20 = 0.95

F(4) = P(X= 0 or 1 or 2 or 3 or 4) = 0.95 + 0.05 = 1

Since P(x) = 0 for all x > 4, F(x > 4) = P(X > 4) = 1
Hence, the cumulative distribution function of the rv X is as follows:
0 x  0 or x 1
0.40 1 x  2

F(x) = 0.75 2 x3
0.95 3 x  4

1 4 x
The graph for F(x) is then as shown in Fig 5
Figure 5 Graph of the cdf for example 6.1
7 DERIVING PROBABILITY MASS FUNCTION FROM CUMULATIVE

DISTRIBUTION FUNCTION
Just as the cdf can be derived from the pmf, if the cdf is known then the pmf can be
derived from it. Also the probability that X falls in a specified interval can be obtained
from the cdf. Suppose the range of a rv X consists of the values x1, x2, x3,………,xn
where x1 < x2 < x3,………,< xn , then
P(x1) = F(x1)
P(x2) = F(x2) - F(x1)
P(x3) = F(x3) - F(x2)
and so on. In general, P(xi) = F(xi) - F(xi-1) for all i = 1, 2, 3, ……., n. In this way we can
derive the pmf from the cdf.

Example 7.1
Given the cdf obtained in example 6.1
0 x  0 or x 1
0.40 1 x  2

F(x) = 0.75 2 x3
0.95 3 x  4

1 4 x
we get
p(0) = 0
p(1) = 0.4 -0 = 0.4
p(2) = 0.75 – 0.4 = 0.35
p(3) = 0.95 – 0.75 = 0.20
p(4) = 1 – 0.95 = 0.05
To obtain the probability that value of X falls in the interval [a, b] such that a < b, where
both a and b are included in the interval, we have to compute P(a < X < b) = F(b) – F(a-)
where a- denotes the largest possible X value that is strictly less than a. If the only
possible values of X are integers so that a and b are both integers, then
P(a < X < b) = P( X = a or a + 1 or a + 2 or…….or b) = F(b) – F(a – 1)
This principle can be used to find the probability that X takes the value a. By setting
b = a we obtain P(X = a) = p(a) = F(a) – F(a – 1).
This method is used to derive the pmf from the cdf.
We can similarly compute P(a < X < b) = F(b-1) – F(a) where a and b are not included in
the interval.
Note that F(b) – F(a) gives us P(a < X < b) where b is included in the interval but a is not
included.
Example 7.2
Given the cdf obtained in example 6.1,
P(1< X < 3) = F(3) – F(0) = 0.95 – 0 = 0.95

P(1< X < 3) = F(3) – F(1) = 0.95 – 0.4 = 0.55

P(X > 3) = 1 – P(X < 3) = 1 – F(2) = 1 – 0.75 = 0.25
Exercise 8
A study of number of delayed flights in an hour (X) at an airport due to fog in winter
revealed the following probability distribution of the rv X.
x 0 1 2 3 4 5 6
p(x) 0.10 0.15 0.25 0.20 0.16 0.11 0.03
(a) Derive the cdf

(b) What is the probability at least three flights are delayed in an hour?
(c) What is P(1< X < 5)?
Solution
(a) The cdf derived as follows:
F(0) = p(0) = 0.10
F(1) = F(0) + p(1) = 0.10 + 0.15 = 0.25
F(2) = F(1) + p(2) = 0.25 + 0.25 = 0.50
F(3) = F(2) + p(3) = 0.50 + 0.20 = 0.70
F(4) = F(3) + p(4) = 0.70 + 0.16 = 0.86
F(5) = F(4) + p(5) = 0.86 + 0.11 = 0.97
F(6) = F(5) + p(6) = 0.97 + 0.03 = 1.00
Since X > 0, the cdf can be presented as follows:
0.10 x  0 or x  1
0.25 1 x  2

0.50 2 x3

F(x) = 0.70 3 x  4
0.86 4 x5

0.97 5 x 6

1 6 x
(b) P(X > 3) = 1 – F(2) = 1 – 0.50 = 0.50
(c) P(1< X < 5) = P( X = 2 or 3 or 4) = F(4) – F(1) = 0.86 – 0.25 = 0.61

PRACTICE QUESTIONS
1. Suppose one die has spots 1, 2, 2, 3, 3, 4 and a second die has spots 1, 3, 4, 5, 6,
8. If both dice are rolled, list the sample space (all possible outcomes). Let the rv
X = total number of spots showing. What is the pmf of X? Show that this pmf is
the same as that for two normal dice, each having 1, 2, 3, 4, 5, 6 spots.
2. At the points x = 0, 1, 2, 3, 4, 5, 6 the cdf for the discrete rv X is

F(x) = x(x+1) /42. Find the pmf for X.
3. Urn 1 and urn 2 each have two red balls and two white balls. Two balls are drawn
simultaneously from each urn. Let
X1 = number of red balls in the sample from first urn, and
X2 = number of red balls in the sample from the second urn.
Find the pmf of X1 + X2
4. An urn contains four balls numbered 1, 2, 3, and 4. If two balls are drawn from
the urn at random and Z is the sum of the numbers on the two balls, find
(a) the probability distribution of Z and draw the histogram
(b) the cdf of Z and draw its graph
5. A coin is biased so that heads is twice as likely as tails. For three independent
tosses of the coin, find
(a) the probability distribution of X, the total number of heads
(b) the probability of getting at most two heads, using the cdf of X
(c) P(1 < X < 3) and P(X > 2), using the cdf
6. The amount of coffee (in grams) in a 230-gm jar filled by a certain machine is a
random variable whose probability density is given by

0 x  227.5
 1
f ( x)   227.5  x  232.5
5
0 x  232.5


Find the probabilities that a 230-gram jar filled by this machine will contain
(a) at most 228.65 gm of coffee
(b) anywhere from 229.34 to 231.66 gm of coffee
(c) at least 229.85 gm of coffee
7. A library subscribes to two different weekly news magazines, each of which is

supposed to arrive in Wednesday’s mail. In actuality, each one may arrive on
Wednesday, Thursday, Friday, or Saturday. Suppose the two arrive independently
of one another, and for each one P(Wed) = 0.3, P(Thu) = 0.4, P(Fri) = 0.2, and
P(Sat) = 0.1. Let Y = the number of days beyond Wednesday that it takes for both
magazines to arrive (so possible Y values are 0, 1, 2, or 3). List all the possible
outcomes and compute the pmf of Y.
8. Given the following cdf, derive the pmf of Y and draw the
(a) histogram of the pmf
(b) graph of the cdf
0 y 1
0.05 1 y  2

0.15 2 y4
F ( y)  
0.50 4 y8
0.90 8  y  16

1 16  y
9. A contractor is required by the city planning department to submit one, two,

three, four, or five forms (depending on the nature of the project) in applying for
a building permit. Let Y = the number of forms required of the next applicant.
The probability that y forms are required is known to be proportional to y, ie,
p(y) = ky for y = 1, 2, 3, 4,5
(a) What is the value of k?
(b) What is the probability that at most three forms are required?
(c) What is the probability that between two and four forms (inclusive)
are required?
y2
(d) Could p(y) = 50 for y = 1,…..,5 be the pmf of Y?

10. An insurance company offers its policyholders a number of different premium

payment options. For a randomly selected policyholder, let X = the number of
months between successive payments. The probability mass function of X is as
follows:
x 1 3 4 6 12
p(x) 0.30 0.10 0.05 0.15 0.40
(i) Derive the cumulative distribution function (cdf) of X and draw the graph
of this cdf
(ii) Using the cdf, compute P(3 ≤ X < 6), P(3 < X < 6), and P(4 ≤ X).


Continuous random variables and probability distributions
DC-1
Semester-II
Lesson: Continuous random variables
And probability distributions
College/Department: Department of Economics,

TABLE OF CONTENTS
Section Number and Heading Page
Number
Learning Objectives
1. Continuous random variables

2
2 Probability distributions for continuous random variables
4
3. Cumulative distribution functions for continuous random variables
11
4. Deriving probability densities from cumulative distribution functions
14
5. Percentiles of a continuous distribution
16
6. Shape of the probability distribution
17
Practice Questions
20
Reference
Cengage Learning, 8th edition [Chapter 4]
CONTINUOUS RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
Learning objectives:
In this chapter you will learn what is meant by a continuous random variable. You
will learn how to arrive at the probability distribution of such types of random
variables and how to represent these graphically, as well as presentation by
summary expressions. You will then learn how to derive cumulative distribution

functions from the probability distribution function. You will also be able to derive
the probability densities from the cumulative distribution function. If either the
probability density function or the cumulative distribution function is known then
you will be able to evaluate the probability that the random variable takes on
specific values or a range of values. You will also learn how to identify the
characteristics of the population distribution like the shape of the distribution.
Chapter Outline
1. Continuous random variables
2. Probability distributions for continuous random variables
3. Cumulative distribution functions for continuous random variables
4. Deriving probability densities from cumulative distribution functions
5. Percentiles of a continuous distribution
6. Shape of the probability distribution
1 CONTINUOUS RANDOM VARIABLES
A random variable is said to be continuous when the outcome of a random

experiment can be any real number in a given interval and the number of
possibilities is uncountably infinite. The outcomes of experiments are denoted by
points on a line or on line segments of the measurement axis.
Example 1.1
Students of a college are given an objective type test. The proportion of correct
answers that a student scores in the test is a continuous variable which can range
from 0 to 1. Measured as a percentage, the outcome varies from 0 to 100
percent.
Example 1.2
A student travels to college by metro. The frequency of trains in the morning is 4
minutes. If the student reaches the platform as one train is departing she will
have to wait for 4 minutes till the next train enters the station. If she reaches just
as one train enters the station then she will have to wait 0 minutes to board the
train. If she reaches after the earlier train has left and the next train is yet to
arrive, she will have to wait for a time period between 0 and 4 minutes. Waiting
time is a continuous variable with a minimum of 0 minutes and a maximum of 4
minutes.

Example 1.3
The daily consumption of water (in liters) by an individual at home varies from
day to day through any given year. It depends on various factors like amount of
time spent at home, weather conditions, time of year, how much of the time
spent at home is during waking hours, etc. The unit of measurement is a
continuous variable with a minimum value of 0 liters.
Definition 1
A random variable is continuous if both the following conditions apply
1. Its set of possible values consists either of all numbers in an interval on
the number line or all numbers in a disjoint union of such intervals.
2. No possible value of the random variable has a positive probability.
Condition 1 implies that there is no way to create a listing of all the infinite
number of possible values of the variable. Condition 2 implies that intervals of
values have positive probability. As the width of the interval diminishes,
probability of the interval decreases. In the limit, probability of the interval is zero
as the width of the interval reduces to zero.
Example 1.4
The university team is scheduled to visit any minute during a three hour long
examination starting at 9am. We may want to find the probability that the team
visits at a given time or we may be interested in the probability that the visit
takes place during a given time interval. The sample space is from 0 to 180
c
minutes. The probability that the team visits during an interval of length c is .
180
This assignment of probabilities applies only to intervals on the measurement axis
from 0 to 180. The probability decreases as the interval becomes shorter. For an
5
interval of 5 seconds, the probability is computed as  0.0004629 As the
10800
length of the interval approaches zero, the probability that the team will visit also
approaches zero. That is why we always assign zero probability for a single point
on the number line. This does not mean that the team will not visit. The team will
visit at some point in the interval from 0 to 180 minutes even though each point
has zero probability.
Variables such as time, height, distance, temperature, area, volume, weight, etc
that require measurement are continuous. In practice, however, limitations of

measurement instruments often do not allow measurement on a continuous

scale. Yet we study models of continuous variables as they often reflect real world
situations.
2 PROBABILITY DISTRIBUTIONS FOR CONTINUOUS RANDOM

VARIABLES
Whereas the set of possible values of a discrete rv is a sequence, the set of
possible values for a continuous rv is an interval. The continuous rv X can take
any one of the infinite number of possible values in that interval. In this case
random variables can take on values on a continuous scale.
To derive the probability distribution for a continuous rv let us first begin with a
discrete rv. Let X be a discrete rv which can take integer values such that x 1 < X
< xn, where x1 and xn are the minimum and maximum values respectively of the
rv X.
If x = x1, x2, …., xn then we can draw a probability histogram with n rectangles.
The area of the rectangle centered at xj is the proportion of the population that
fj
has the value xj, ie, where N is the population size. Summing over the n
N
n
fi
values of X we obtain N
i 1
1
Now we allow X to take one additional value in each interval so that x 1’ is midway
between x1 and x2; x2’ is midway between x2 and x3; and so on. Then total
number of x values will be 2n – 1 (instead of 2n, as there are n - 1 intervals).
With measurements of x taken at smaller intervals, the rectangles become
narrower, though the sum of the areas of all rectangles remains one.
If we continue this process of measuring x at smaller and smaller intervals, the

resulting sequence of probability histograms, of the distributions of corresponding
discrete random variables, will approach a smooth curve. Figure 1 illustrates this
process in the three panels 1.1, 1.2 and 1.3
Figure 1 Deriving histogram of a continuous random variable

Fig 1.1 Histogram of a discrete random variable
Fig 1.2 Histogram of the discrete random variable with

measurements taken at smaller intervals
Fig 1.3 Limit of a sequence of discrete histograms

Since for each histogram the total area of all rectangles equals one, the total area
under the continuous curve is also one. The smooth curve represents a
continuous probability distribution. The sum of the areas of the rectangles that
represent the probability that X falls within any specified interval [a, b]
approaches the corresponding area under the curve for the interval from a to b.
Definition 2
Let X be a continuous random variable. Then a probability distribution or
probability density function (pdf) of X is a function f(x) such that for any two
b
numbers a and b with a < b, P(a < X < b) =  f ( x) dx

a
Probability density functions are also referred to as density functions.
The probability that X takes on a value in the interval [a, b] is the area under the
graph of the density function above the interval [a, b] on the number line.

The following two conditions must be satisfied by f(x) to be a pdf:
1. f(x) > 0 for -∞ < x < ∞


2.  f ( x) dx  1

The first condition requires non-negative values of pdf for any x value. The
second condition requires that area under the entire curve of f(x) should equal
one, ie X values are collectively exhaustive. If all possible values of X are
considered then the second condition will be satisfied. Examples of pdf are the
continuous Uniform Distribution, the Normal Distribution, the Exponential
Distribution, etc.
Unlike the pmf, where we can obtain P(X = c) as the probability that the discrete
rv X takes the value c, the probabilities for a continuous rv are always associated
with intervals. The pdf yields P(X = c) = 0 for any particular value of the rv X.
This follows from the definition of a continuous rv as specified in condition 2 of
definition 1.
For the discrete rv X, each possible value of X is assigned a positive probability.

In case of the continuous rv X, area under the density curve that lies above any
single value of X is zero. We have:
c c 
P(X=c) = p(c) =  f ( x) dx  lim  f ( x) dx  0

c
0
c
In view of this property, it does not matter if we include or we exclude the

endpoints of the interval from a to b. Thus, for the continuous rv X, if a < b,
P(a < X < b) = P(a < X < b) = P(a < X < b) = P(a < X < b).
This is not the case with discrete random variables. If both a and b are possible
values of the discrete rv X then these probabilities will all be different. If a < b,
then for the discrete rv X,
P(a < X < b) ≠ P(a < X < b) ≠ P(a < X < b) ≠ P(a < X < b).
Example 2.1
A milk vendor has a refrigerated storage tank of 1000 liters capacity, which is
filled each morning for sale during the day. It is not possible to predict the
amount of milk sold on any particular day. The sale of milk on any day can vary
from 0 lt. to 1000 lt. Past experience shows that any demand in the interval of 0

and 1000 is equally likely. The rv X indicates the sale of milk on a particular day.
The pdf of X is given by the continuous Uniform Distribution
0.001 0  x  1000
f(x) = 
0 otherwise
In general, if α and β are the lower and upper limits of the value that the
continuous rv X can take, then pdf of X is
 1
 0  x  1000
f(x; α, β ) =    

0 otherwise
The probability of an interval depends only on the width of the interval in case of
the uniform distribution.
The pdf of the uniform distribution is illustrated in Figure 2.
Figure 2 Graph of the continuous Uniform Distribution
1
In our example, β – α = 1000 so that = 0.001. We can use this to obtain
 
the probability that sale of milk on a particular day is between 200 and 500 liters
as follows:
P(200 < X < 500) = (500 – 200)(0.001) = 0.3
Note that α and β are the parameters of a population of the continuous rv X that
is described by a uniform distribution. We have a family of uniform distributions
for different values of the two parameters. Each distribution is specified by a
particular pair of values of α and β.
Exercise 1
Show that f(x) = 3x2 for 0 < x < 1 represents a pdf and calculate P(0.1 < x <
0.5).

Solution
f(x) can represent a pdf if both conditions for a pdf are satisfied, ie, f(x) > 0 and

 f ( x) dx  1 .

Since f(x) = 3x2 and x2 > 0 always, hence f(x) > 0 for all x values. Therefore,
for 0 < x < 1, f(x) > 0 and the first condition is satisfied.
1 1
3x 3
 3x
2
dx = = 1 – 0 = 1, which satisfies the second condition for pdf.
0 3 0
Since both conditions are satisfied, f(x) = 3x 2 represents a pdf for 0 < x < 1
0. 5
Now, P(0.1 < x < 0.5) =
 3x dx = (0.5)3 – (0.1)3 = 0.125 – 0.001 = 0.124
2
0 .1
Example 2.2
e  x x0
The pdf for a continuous rv is given as f(x) = 
0 x0
So that as x value increases from x = 0, f(x) decreases rapidly or exponentially,
as illustrated in Fig 3
Figure 3 pdf of f(x) = e- x for x > 0
e
x
Now, P(a < X < b) = dx. This is the shaded area in figure 3.
a
If a = 2 and b = 5, then
5 5
 e x
e
x
P(2 < X < 5) = dx = 2 = - (0.006738 – 0.135335) = 0.128597 =
2
0.13
Therefore, 13 percent of the area under the curve of f(x) = e- x
lies above the
measurement axis in the interval [2, 5].

Exercise 2
Show that f(x) = e- x for 0 < x < ∞ represents a pdf, and compute the probability
that
X > 1.
Solution

f(x) = e -x
would represent a pdf if f(x) > 0 and  f ( x) dx  1 for 0 < x < ∞
0
Since e > 0, for all positive x values e-x > 0.

f(x) = 1 for x = 0. If x> 0, f(x) < 1. As x → ∞, f(x) → 0
 
 e x

 f ( x) dx e
x
= dx = 0 = [0 – 1] = 1.
0 0
Thus both conditions are satisfied and f(x) is a pdf.


e
x
P(X > 1) = dx = - [ 0 – e- 1] = e-1 = 0.368
1
Exercise 3
The pdf of the rv X is given by
 k
 0 x4
f ( x)   x
 0 otherwise

Find (a) the value of k, and (b) P(X > 1)
Solution
4
k
(a) Given that f(x) is a pdf we have 
0 x
dx 1
4 4
4 k x 1
Now 
0 x
dx 
12
= 2k [2 – 0] = 4k. Equating 4k and 1 we get k = 4 so
0
1
that f(x) =
4 x
4 4
2 x 1
P(X > 1) = 
1 1
(b) dx  = 1 - 2 = 2 = 0.5
1 4 x 4 1
Exercise 4

If the continuous random variable X can take only non-negative values and has
the density function f(x) = e2x for x > 0, and 0 otherwise, what is the maximum
value of X?
Solution

e dx  1
2x
If f(x) is a density function then for x > 0, and 0 otherwise.
0
x x
e2 y e2 x 1
0     1  e2x = 3
2y
e dy
2 0
2 2
Therefore, 2x = ln 3 = 1.0986, so that x = 0.549
Hence, f(x) will be a density function for 0 < x < 0.549. Maximum value of X is
0.549
3 CUMULATIVE DISTRIBUTION FUNCTIONS FOR CONTINUOUS
RANDOM VARIABLES
Similar to the case of discrete random variables, there are many problems where
we need to know the probability that a continuous rv X takes a value that does
not exceed a specified value x. For this we need the cumulative distribution
function (cdf) of X.
Definition 3
If X is a continuous random variable then the cumulative distribution function
F(x) for X is defined for every number x by
x
F(x) = P(X < x) =  f ( x)


dx
For each x, F(x) is the area under the density curve to the left of x. As x value
increases, F(x) also increases smoothly until F(x) =1 and then it continues as a
flat line parallel to the measurement axis.
The cdf gives the probability P(X < x) obtained by integrating the pdf f(y)
between
-∞ and x. As in the case of the discrete rv, here too F(- ∞) = 0, F(∞) = 1, and
F(a) < F(b) when a < b.
Also P(a < X < b) = F(b) – F(a) where a < b.
Since X is a continuous rv,
P(a < X < b) = P(a < X < b) = P(a < X < b) = F(b) – F(a) where a < b.
Example 3.1

 1
 A x B
Given the uniform distribution f(x; A, B ) =  B  A ,

0 otherwise
x
the cdf, F(x) =  f ( y)


dy
Since minimum value of the rv is A, we have

yx
xA
x
1 1
F(x) =  dy  y =
A
BA BA yA
BA
B
1
Since BA
A
dx  1 therefore F(x) = 0 for x < A and F(x) = 1 for x > B.
Hence, the cdf of the uniform distribution is
0 x A
 x A

F(x) =  A x  B
B A

1 x B
The pdf and cdf of the uniform distribution of a continuous rv are illustrated in Fig
4.
Figure 4 pdf & cdf of a uniform distribution
If the graph of the pdf is bell-shaped as in case of the Normal Distribution [fig 5
(a)], then the cdf will be as in Figure 5 (b)
Figure 5 pdf & cdf of normal distribution

Exercise 5
The density function of the rv X is given by
6 x1 x  0  x 1
f(x) = 
0 otherwise
Obtain the cdf and compute P(X < ½).
Solution
yx
 y 2 y3 
 
x x x
F(x) =  f ( y) dy  0 6 y 1  y  dy  6 0 y  y 2
dy  6 2  3
   y 0
If x < 0, F(x) = 0
If 0 < x < 1, F(x) = 3x2 – 2x3
If x = 1, F(x) = 3 – 2 = 1
If x > 1 F(x) = 1 since f(x) = 0
Therefore the cdf can be represented as follows
0 x0
 2
F(x) = 3 x  2 x 0  x 1
3
1 x 1

To compute P(X < ½), we substitute x = ½ in F(x) since P(X < ½) = P (X < ½)
for a continuous rv.
3
F(1/2) = 3(1/4) – 2(1/8) = 4  41  12 = 0.5
Exercise 6
x 1
Show that the expression g ( x)  can serve as a cdf for -1 < x < 1.
2
Solution

If g(x) is to represent a cdf we must show that g(x) = 0 for x < -1, g(x) = 1 for x
> 1, and 0 < g(x) < 1 for the interval -1 < x < 1.
1 1 11
Now, g (1)   0, and g (1)   1. Let us select a value x = 0 in the
2 2
given interval.
1
Then g(0) = 2 where 0 < 1
2
< 1.
Since all three requirements are satisfied, g(x) can serve as a cdf for -1 < x < 1
4 DERIVING PROBABILITY DENSITIES FROM CUMULATIVE

DISTRIBUTION FUNCTIONS
The cdf, if given, can be used to obtain the corresponding pdf. The cdf is also
useful for computing the probabilities of various intervals.
Let X be a continuous random variable with the pdf f(x) and the cdf F(x).
Then for any two numbers a and b such that a < b,
P(X < a) = F(a). Hence,
P(X > a) = 1 – F(a) and P(X < b) = F(b), so that
P(a < X < b) = P(a < X < b)
= P(X < b) – P(X < a)
= F(b) – F(a), as illustrated in Figure 6
Figure 6 Probability of an interval
For given cdf we can obtain the pdf by taking the derivative of F(x). By definition
3, if X is a continuous rv and the value of its probability density at y is f(y) then
the cdf is
x
F(x) = P(X < x) =  f ( y) dy where -∞ < x < ∞


dF ( x)
Hence, f(x) = = F'(x) at every x at which the derivative F'(x) exists.
dx

Example 4.1
In example 3.1, for the uniform distribution the cdf is
0 x A

 x A
F ( x)   A x  B
B A

1 x B
The graph of F(x) is given in Fig 4(b).
It can be seen that F(x) is differentiable for A < x < B.
At x = A and x = B, F(x) cannot be differentiated.
For x < A, F(x) = 0 and for x > B, F(x) = 1
Hence, F'(x) = f(x) = 0 if x < A, or, if x > B.
For A < x < B, F'(x) = d  x  A   1 = f(x).

dx  B  A  B  A
Thus we obtain the pdf of the uniform distribution as
0 x A

 1
f ( x)   A x  B
B A

0 xB
Since x is continuous, f(x) = 1 = P(A < x < B) = P(A < x < B)

BA
Exercise 7
A continuous rv Y has a cdf given by
0 y0
 2
F(y) =  y 0  y 1
1 y 1

Compute P( 12 < Y < 3
4
) in the two ways by using (a) the cdf, and (b) the pdf
Solution
1 3 3 1 9 1 5
(a) P( 2 < Y < 4 ) = F( 4 ) – F( 2 ) = 16 - 4 = 16 = 0.3125
(b) First we obtain the pdf by differentiating F(y)

F'(y) = 0 for y < 0 and for y > 1. If 0 < y < 1, then F'(y) = f(y) = 2y so
that the pdf is as follows:
2 y 0  y 1
f(y) = 
0 otherwise
3
4 3
 2 y dy  y
2 4
Then P( 1
2
<Y< 3
4
)= 1
2
1
2

9 1 5
=  
16 4 16
 0.3125
5 PERCENTILES OF A CONTINUOUS DISTRIBUTION

We know that the entire area under the graph of the pdf f(x), above the
measurement axis, is 1. Therefore 100 percent of the probability distribution for
all possible values of the continuous rv X lies to the left of the maximum value
that X can take.
We may require to find two possible values a and b of X such that:

(i) a < b, and
(ii) a certain percentage of the area under the graph of f(x) lies between a and b.
For example, given the distribution of marks obtained by students in an
examination, we may need to find the minimum marks scored by the top 5
percent of the students.
Definition 4
Let p be a number between 0 and 1. The (100p)th percentile of the distribution
of a continuous random variable X, denoted by η(p), is defined by
 p
p = F[η(p)] =  f ( y) dy

Then η(p) is that value on the measurement axis such that 100p percent of the
area under the graph of f(x) lies to the left of η(p) and 100(1-p) percent lies to
the right. This is illustrated in Figure 7
Figure 7 Percentiles

If p = 0.3 then 30% of the area under the graph of f(x) lies to the left of η(0.3)
and 70% to the right of η(0.3). The 30 th percentile is denoted by η(0.3) since p =
0.3
Example 5.1
For the rv X with following pdf
1
 x  1 2 x4
f ( x)   8
 0 otherwise

To find the 75th percentile, η(0.75), we need to first obtain the cdf from the given
pdf.

 8 x 1 dx
1
F(x) =

 p  p
 x 1 x2 x
Therefore, F[η(p)] = p = 
2
   dx =
 8 8

16 8 2
3
Substituting p = 0.75 = 4 we obtain
3   p    p  4 2
2
=    =
1
  p 2  2  p   8
4 16 8 16 8 16
Rearranging the terms we get
[η(p)]2 + 2η(p) – 20 = 0
 2  4  80
Factorising, η(p) =   1  4.58  3.58 or  5.58
2
Since minimum value of X is 2 and the maximum is 4, the 75th percentile is 3.58
because that is the only possible value that X can take. The alternative value -
5.58 does not fall in the range of possible values.
Hence, η(p) = 3.58
6 SHAPE OF THE PROBABILITY DISTRIBUTION

One of the applications of percentiles is to find the median of the distribution of a
continuous random variable. The median is that value of the random variable
where half of the distribution lies to the left of that value and the remaining 50
percent of the distribution is to the right of the value.
Definition 5

The median of a continuous distribution, denoted by ~ , is the 50th percentile so
that ~ satisfies the condition F( ~ ) = 0.5
Example 6.1
The median of the pdf given in example 5.1 is computed by letting p = ½ so that
~ ~
1  x 1 x2 x ~ 2 ~ 1
F[ ~ ] =
2
= 2  8 8  = 16  8 2 = 16  8  2
 dx
~ 2 ~
Therefore,   1  0  ~ 2  2~  1600  0
16 8
 2  4  64
So that ~ =  1  4.123
2
Since 2 < x < 4, ~ = 3.123.
Half the area of the density curve is to the left of 3.123 and the other half is to
the right.
If a random variable has a symmetric pdf then the median will coincide with the
point of symmetry since half the area under the density curve lies on either side
of the point.
A positively skewed distribution has a long right-hand tail. Similarly, a negatively
skewed distribution has a long left-hand tail. Figure 8 illustrates the three kinds of
distributions.
Figure 8 Examples of symmetric and asymmetric distributions

Example 6.2
The incomes of employees of a company will usually be positively skewed as
there are a large number of low income workers and fewer employees with high
income.
Example 6.3
A well known manufacturing company assures that its product will last a
minimum period of three years. However, due to a defective component sourced
from one of the suppliers, the lifetime of a batch of the product is likely to be
drastically reduced. The distribution will then be negatively skewed.
It can be shown that for a symmetric pdf the median coincides with the mean of
the distribution. If the mean and median have different values then the
distribution is asymmetric, ie, skewed. If mean is less than median the
distribution is skewed to the left or negatively skewed. On the other hand a
distribution is positively skewed or skewed to the right when the mean is greater
than the median.
The mode of the distribution is that value of the random variable at which the
graph of the probability distribution reaches its highest point. If there is only one
peak or “high point” it is a unimodal distribution. If there are two modes it is
called a bimodal distribution. A distribution having more than two modes is said
to be multimodal.
The mode of a unimodal distribution of a random variable is obtained by

differentiating the probability density function. For the rv X with pdf f(x), the
mode is the value of X at which f’(x) = 0 and f”(x) < 0
Example 6.4
Suppose that the rv X has pdf

1

9 4  x
2
 1  x  2
f ( x)  
 0 otherwise

Differentiating f(x) with respect to x, we get
f ' x   0 
2x
9
Setting f ' x   0 we get x = 0
Taking the second derivative,
f ' ' x    so that f ' ' x   0

2
9
Therefore, the mode of this pdf is at x = 0
Comparison of the mode and median can also be used to indicate the shape of
the distribution. For a symmetric distribution mode = median. In case of a
positively skewed distribution, medium > mode, whereas medium < mode for a
negatively skewed distribution.
The other characteristics of the distribution like mean and variance can be
computed with the help of mathematical expectations.
PRACTICE QUESTIONS
1. Suppose the rv Y has the pdf f(y = 4y3 for 0 < y < 1 and 0 otherwise.
Find
P(0 < Y < ½).
2. If Y is an exponential rv f(y) = λe-λy for y > 0 and 0 otherwise, find the

cdf F(y).
3. Suppose the cdf of the rv Y is F(y) = 1

12 (y2 + y3) for 0 < y < 2 and 0
otherwise. Find the pdf f(y)
4. The amount of coffee (in grams) in a 230-gm jar filled by a certain

machine is a random variable whose probability density is given by


0 x  227.5
 1
f ( x)   227.5  x  232.5
5
0 x  232.5

Find the probabilities that a 230-gram jar filled by this machine will
contain
(a) at most 228.65 gm of coffee
(b) anywhere from 229.34 to 231.66 gm of coffee
(c) at least 229.85 gm of coffee
5. Suppose the cdf for the continuous rv X is

0 x0
 2
x
F ( x)   0 x2
4
1 2 x

Use the cdf to obtain the following:

(a) P(X < 1)
(b) P(0.5 < X < 1)
(c) P(X > 1.5)
(d) Median
(e) pdf of X
6. The time taken by employees of a company to complete a task is a rv that

has a uniform distribution. Let X= time taken in minutes. The minimum
and maximum times are 10 minutes and 50 minutes respectively. For a
new task, those taking less time will be considered efficient and given a
bonus while those taking too much time are inefficient and will be sent for
additional training. To qualify for bonus an employee must belong to the
best 20 percent of all employees. To require training the employee must
belong to the worst 30 percent category. What is the range of time for
which an employee would neither get a bonus and nor be required to go
for additional training?
7. In certain experiments, the error made in determining the velocity of a

projectile is a random variable having a uniform density with minimum

value α = - 0.015 and maximum value β = 0.015. Find the probabilities that
such an error will
(i) be between – 0.002 and 0.003
(ii) exceed 0.005 in absolute value
8. If the continuous random variable X can take only non-negative values

and has the density function f(x) = 2e-2x for x > 0, what is the maximum
value of X?
(Hint: Use conditions required to be satisfied by a pdf)

Mathematical expectation discrete
DC-1
Semester-II
Lesson: Mathematical expectation discrete

TABLE OF CONTENTS
1. Expected value of a discrete random variable 2

2. Expectation of a function of a discrete random variable 8
3. Rules of mathematical expectation 9
4. Variance of a discrete random variable 13
5. Variance of a function of a discrete random variable 14
6. Covariance and variance of sums of random variables 16
7. Parameters of the probability mass function 19
Content Developer
Reference
.
MATHEMATICAL EXPECTATION: DISCRETE RANDOM VARIABLES

In this chapter you will learn how to obtain two main characteristics of the probability
distribution of a discrete random variable. The mean of the distribution is the point on the
number line where the distribution is centered and the variance is a measure of the spread of
the distribution. You will learn how to derive these characteristics of distributions of discrete
random variables. You will also learn how to apply the rules of mathematical expectation to
functions of random variables as well as to sums of random variables.
Chapter Outline
1. Expected value of a discrete random variable
2. Expectation of a function of a discrete random variable
3. Rules of mathematical expectation
4. Variance of a discrete random variable
5. Variance of a function of a discrete random variable
6. Covariance and variance of sums of random variables
7. Parameters of the probability mass function
1 EXPECTED VALUE OF A DISCRETE RANDOM VARIABLE

Mathematical expectation of a random variable (rv) is a very important concept in probability
theory. Graphical presentation of the probability distribution of a rv is valuable in reaching
conclusions about the form of the distribution. However, mathematical expectation helps us
to obtain summary measures of the characteristics of the probability distribution.
Mathematical expectation of a rv is referred to simply as its expected value.
If X is a discrete rv with a set of possible values D and pmf p(x), then we can define the
expected value of X, denoted by E(X) or μx, as follows
Definition 1
E(X) =  x. px 
xD

If D = [x1, x2, x3,……xn], then E(X) = μx, = x1.p(x1) + x2. p(x2) + x3. p(x3) +…….. + xn.
p(xn)
If it is clear to which X the expected value refers, μx may be used instead of μx,
The expected value of a rv is called its mean value. We can interpret the expected value as
the long-run average value that the rv takes over a large number of repeated trials of an
experiment performed in identical and independent fashion. When the trials are conducted in
this fashion then the outcome of any trial is independent of outcomes of the other trials.
Suppose a random experiment is repeated N times and the outcome X = x is observed in Nx

number of these trials. For each possible value of X= x1, x2, x3,…… in the set D, we can
Nx N
obtain x . The sum of x x over all possible values of x is then the average of values
N N
xN x
taken by the rv over all N trials, ie, 
xD N
. As the number of trials increases and N becomes
Nx
infinitely large, the ratio tends to the probability of occurrence of x. In other words,
N
Nx
→ P(x) as N → ∞. Thus, E(X) is the mean of the probability distribution of the random
N
variable X.
To compute the population average value of X we need only the possible values of X along
with their respective probabilities. The size of the population is immaterial as long as the pmf
is given. The mean value of X is a weighted average of the possible values of X, where the
weights are the probabilities of these values. The expected value μ may not coincide with any
of the possible values of X. Note that the mean will coincide with the median if the
distribution is symmetric.
The expected value of a rv X is also referred to as the first moment of X about the origin or
simply the first moment. The quantity E(Xn) is similarly the nth moment of X where n > 1.
Example 1.1

If X is a Bernoulli rv with pmf

1  p x0

p ( x)   p x 1
 0
 otherwise
Then, E(X) = 0.p(0) + 1.p(1) = 0(1- p) + 1(p) = p

Hence, expected value of the Bernoulli distribution is the probability that X takes the value 1.
If a population consists of only 0’s and 1’s in the proportions (1-p) and p respectively, then
the population mean is μ = p.
Example 1.2
If X has a pmf as follows
x 1 2 3 4
p(x) 0.002 0.146 0.588 0.264
Then μ = E(X) = 1(0.002) + 2(0.146) + 3(0.588) + 4(0.264)

= 0.002 + 0.292 + 1.764 + 1.056
= 3.114
Note that 3.114 is not one of the possible values of X since x = 1, 2, 3, 4. Also population
size is not given nor is it required.
Example 1.3
Let X = number of trials till the first success is observed, and p = the probability of success.
The pmf of X is
 p(1  p) x 1 x  1,2,3......
p ( x)  
0 otherwise
The mean of X is E(X) obtained as follows
 
E(X) =  x. p x  =  x(1  p) x 1  p x(1  p) x 1
xD x 1 x 1
Now
d
1  p x   x1  p x1
dp
Substituting we get


1  p x 

d
E(X) = p  
x 1  dp 
d 
1  p x 

= p  1
dp  x 1 
 x
Since  1  p   
1
,
 x 0  1  1  p 

 1  p 
1
 1,
x
therefore
x 1 p

and () 1  p   1 
x 1
x 1 p
[This is a convergent geometric series as p < 1 and (1-p) < 1]
d  1  1 1
 E(X) = p 1    p 0  2  
dp  p  p  p
 1  1
= p 0  
 p2  p
Alternately
 
 xp1  x    xpq x 1 = pq0 + 2pq1 + 3pq2 + 4pq3 + 5pq4 + 6pq5 +…………..

x 1
x 1 x 1
= p(1 + 2q + 3q2 + 4q3 + 5q4 + 6q5 + …………)

1 2 3 4
Using series expansion 2 = 1 + 2x + 3x + 4x + 5x + ……… , where x = q
1  x 
1 1 1
we get E(X) = p  p 2  since (1-q) = p
1  q 2
p p
1
If p = 0.5 then E(X) =  2 . ie, a success will be observed after 2 trials on the average.
0.5
If p = 0.2 then E(X) = 5, ie, on the average a success will be observed after 5 trials.
As p approaches 1, there will be few failures before a success is observed. As p approaches 0,
we expect many failures before a success is observed.

It is possible to have a probability distribution where larger values of the rv X have higher
probabilities. Such distributions with “heavy tails” may result in a mean value that is not
finite.
Example 1.4
 k
 2

x  1,2,3,.. k
p( x)   x where k is chosen so that x 2
1

0 otherwise x 1
 
k 1
E(X) = μ = x
x1 x 2
 k 
x1 x

1
E(X) is not finite as the harmonic series x
x 1
is equal to infinity and p(x) does not decrease
sufficiently fast as x increases.
Exercise 1
Find the expected value of the rv X having the pmf
x2
 x  1, 0,1, 3
p ( x)   7
 0 otherwise
Solution
x2 3 1 1 1
E(X) = x   1   0  1   3  
7 7 7 7 7
3 1 1
  1   0  1   3 
7 7 7
1

7
Exercise 2
Find E(X) where X is the outcome when we roll a fair die.
Solution
For a fair die each face of the die is an equally likely outcome.
 p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = 1
6

where, x = 1, 2, 3, 4, 5, 6 denote the number of dots on the six faces of the die or outcome
when we roll the die.
1 1 1 1 1 1 21 7
E(X) = (1) + (2) + (3) + (4) + (5) +(6) = = = 3.5
6 6 6 6 6 6 6 2
Note that we can never observe the outcome to be 3.5 as X can only take the integer values
1, 2, 3, 4, 5, 6.
E(X) is simply the average value of X if we roll a fair die a large number of times.
Exercise 3
An investor is considering three strategies for a $1,000 investment. The estimated probable
returns are:
Strategy 1: A profit of $10,000 with probability 0.15 and a loss of $1,000 with probability
0.85
Strategy 2: A profit of $1,000 with probability 0.50, a profit of $500 with probability 0.30
and a loss of $500 with probability 0.20
Strategy 3: A certain profit of $400.
Which strategy has the highest expected profit?
Solution
Let Xj = returns from investment in jth strategy, where j = 1, 2, 3
Strategy 1: E(X1) = (0.15)(10000) + (0.85)(-1000) = 1500 – 850 = $650
Strategy 2: E(X2) = (0.50)(1000) + (0.30)(500) + (0.20)(-500) = 500 + 150 – 100 = $550
Strategy 3: E(X3) = (1)(400) = $400
Since E(X1) > E(X2) > E(X3) therefore strategy 1 is most profitable.
Exercise 4
A group of 500 persons participate in a lottery with a first prize of 1000, two second prizes
of 500 each and five third prizes of 100 each. If the lottery is equitable, so that each
player’s expectation is zero, then what is the fair price of the lottery ticket?
Solution

1
Since there are 500 players, the probability of winning the first prize is , the probability
500
2 5
of winning the second prize is , and that of winning the third prize is . Then,
500 500
1 2 5 492
E(X) = (1000) + (500) + (100) + (0) =2+2+1=5
500 500 500 500
The lottery will be equitable if ticket price is 5, in which case expected earnings are 5 and
the cost to player is 5 so that each player’s expectation is E(X) – 5 = 0
2 EXPECTATION OF A FUNCTION OF A DISCRETE RANDOM VARIABLE

Let h(X) be a function of a discrete rv X. Since X is a rv, h(X) is also a rv. The pmf of h(X) is
the same as the pmf of X. Let Y denote the rv h(X). Let D* denote all possible values of Y
and D denote all possible values of X.
Proposition 1
E Y   Eh X    y. p y    hx. px provided that  hx. px  

yD* xD xD
The expected value of Y or μh(X) is thus a weighted average of possible values of h(x) and the
weights are the corresponding probabilities. E[h(X)] is computed in the same way as E(X),
except that we substitute h(X) in place of X. Examples of h(X) are aX+b, eX, lnX, etc.
Example 2.1
Let X be the damage incurred (in $) in a certain type of accident during a given year. Possible
X values are 0, 1000, 5000 and 10000, with probabilities 0.8, 0.1, 0.08 and 0.02 respectively.
A particular company offers a $500 deductible policy. The company wishes to fix the
premium amount to be charged so that its expected profit is $100.
Since the company offers $500 deductible policies, the amount to be paid in case of accident
claim will be 0, 500, 4500 and 9500 with respective probabilities 0.8, 0.1, 0.08 and 0.02.
Expected payment for accident claim = (0)(0.8) + (500)(0.1) + (4500)(0.08) + (9500)(0.02)
= 50 + 360 + 190 = $600
Expected profit is the difference between the premium charged and the expected expenditure
on accident insurance claims.

Since expected profit is $100, the company should charge a premium of $700 so that
100 = 700 – 600.
Example 2.2
The pmf for the rv X is as follows
x 4 6 8
p(x) 0.5 0.3 0.2
and
Y = h(X) = 20 + 3X + 0.5X2
The possible Y values are 40, 56 and 76, obtained by substituting x = 4, 6, 8 in h(x)
E(Y) = E[h(X)] = (40)(0.5) + (56)(0.3) + (76)(0.2) = 20 + 16.8 + 15.2 = 52
3 RULES OF MATHEMATICAL EXPECTATION

Let h(X) be a linear function of X such that h(X) = aX + b. Then ,
Proposition 2
E[h(X)] = E(aX + b) = a.E(X) + b
ie, μaX+b = aμX + b
Proof
E(aX +b) =  (ax  b). px

xD
= aΣx.p(x) + b Σp(x) = aE(X) + b, since Σp(x) = 1
This proposition yields three rules of expected values:
Rule 1
If b = 0, then for any constant a, E(aX) = aE(X)
Multiplication of X by the constant a changes the unit of measurement. The rule says that
expected value in the new units equals the expected value in the old units multiplied by the
factor a.

Rule 2
If a = 1, then for any constant b, E(X + b) = E(X) + b
If a constant is added to each possible value of X, there is a change in origin. Then the
expected value will be shifted by the same amount b.
Rule 3
If a = 0, then for any constant b, E(b) = Σb.p(x) = b Σp(x) = b
That is, the expected value of a constant is just its value. This is only logical. As a constant
value is a certainty, there is no probability associated with it. The expected value is the value
of the constant itself.
Proposition 3
If n > 1, E(Xn) =  x . p x 
xD
n
This follows from proposition 2. Thus, the second moment (about the origin) is
E(X2) =  x . p x 
xD
2
Proposition 2 can be extended to more than one rv. Let X and Y be two discrete random
variables. If X has a set of possible values D with pmf p(x), and Y has a set of possible values
D* with pmf p(y), then for the function of the two random variables g(X,Y) = X + Y we have
Proposition 4
E(X + Y) = E(X) + E(Y)
Proof
E(X + Y) = E[g(X,Y)] =  g ( x, y). p( x, y)   x. p( x)   y. p( y)  E( X )  E(Y )

yD*xD D D*
We can similarly show that the expected value of the sum of any number of random variables
equals the sum of their individual expectations.
Example 3.1
E(X + Y + Z) = E[(X + Y) + Z] = E(X + Y) + E(Z) = E(X) + E(Y) + E(Z)

Exercise 5
An individual who has automobile insurance from a certain company is randomly selected.
Let Y be the number of traffic rule violations for which the individual was booked during the
last three years. The pmf of Y is
y 0 1 2 3
p(y) 0.60 0.25 0.10 0.05
(a) Compute E(X)

(b) Suppose an individual with Y violations incurs a surcharge of 200Y2. Calculate the
expected amount of the surcharge.
Solution
(a) E(Y) = Σy.p(y) = (0)(0.60) + (1)(0.25) + (2)(0.10) + (3)(0.05)
= 0 + 0.25 + 0.20 + 0.15 = 0.60
(b) Let Z = 200Y2
E(Z) = 200E(Y2) = 200 Σy2.p(y)
= 200[(0)(0.60) + (1)(0.25) + (4)(0.10) + (9)(0.05)
= 200[0 + 0.25 + 0.40 + 0.45] = 200(1.10) = 220
Exercise 6
An appliance dealer sells three different models of upright freezers having 13.5, 15.9, and
19.1 cubic feet of storage space, respectively. Let X = the amount of storage space of the
freezer purchased by the next customer.
Suppose X has the following pmf
x 13.5 15.9 19.1
p(x) 0.2 0.5 0.3
(a) Compute (i) E(X), and (ii) E(X2)
(b) If price of a freezer having capacity X cu. ft. is 25X – 8.5, what is the expected price
paid by the next customer to buy a freezer?
(c) Suppose that although the rated capacity of a freezer is X, the actual capacity is
h(X) = X – 0.01X2. What is the expected actual capacity of the freezer purchased by
the next customer?

Solution
(a) (i) E(X) = (13.5)(0.2) + (15.9)(0.5) + (19.1)(0.3) = 2.7 + 7.95 + 5.73 = 16.38 cu ft
(ii) E(X2) = (13.5)2(0.2) + (15.9)2(0.5) + (19.1)2(0.3)
= 36.45 + 126.405 + 109.443 = 272.298
(b) Let Y = price of freezer, where Y = 25X – 8.5
E(Y) = 25E(X) – 8.5 = (25)(16.38) – 8.5 = $401
(c) Actual capacity = h(X) = X – 0.01X2
E[h(X)] = E(X) – (0.01)E(X2) = 16.38 – (0.01)(272.298) = 16.38 – 2.72 = 13.66 cu ft
Exercise 7
Let X = the outcome when a fair die is rolled once. If before the die is rolled you are offered
1 1
either dollars or h(X) = dollars, would you accept the guaranteed amount or would
3 .5 X
you gamble?
Solution
1
Guaranteed amount = $ = 0.2857 = $0.29
3 .5
1
Otherwise, h(X) = , where x = 1, 2, 3, 4, 5, 6
X
1
Since this is a fair die, P(1) = P(2) = P(3) = P(4) = P(5) = P(6) =
6
 1  1   1  1   1  1   1  1   1  1   1  1 
E[h( X )]                        
 1  6   2  6   3  6   4  6   5  6   6  6 
1  1 1 1 1 1  1  60  30  20  15  12  10  147
 1        360  0.4083
6  2 3 4 5 6  6  60
Therefore E[h(X)] = $0.41 > guaranteed amount $0.29. It is a better option to gamble.
One useful application of Proposition 1 is to obtain E(X2), ie, the second moment of X about
the origin. Here h(X) = X2. Let D is the set of all possible values that X can take. Since the
pmf of X2 is same as the pmf of X, therefore E X 

2
   x p x 
2
4 VARIANCE OF A DISCRETE RANDOM VARIABLE

Expected value of the rv X is the mean of the probability distribution or pmf of X. It tells us
what will be the value of X on the average when the experiment is repeated a very large
number of times in an identical and independent fashion (ie, replicated a very large number
of times). We need to also obtain the variance of X to examine the amount of variability in
the probability distribution of X. The mean and variance are useful measures for summarizing
the essential properties of the pmf.
Example 4.1
Let the rv X have the pmf p(x) = ½ for x = -1, 1 and let the rv Y have the pmf p(y) = ½ for
y = -100, 100.
E(X) = E(Y) = 0 but the pmf for Y is more spread out than that for X
Definition 2
Let X have pmf p(x) and expected value μ. Then the variance of X is
V  X    x2   x    px   E  X   
2 2
xD
and the standard deviation of X is  X  E  X   

2
Variance is thus the expected value of the function h(X) = (X – μ)2, ie, the squared deviation
of X from its mean. Hence, variance is the expected squared deviation. Variance is thus the
weighted average of squared deviations, where the weights are the probabilities.
An alternative formula for Var(X) can be derived as follows, where x  D and D is the set of
all possible values of the rv X:
Var(X) = E[(X – μ)2]

= E[X2 - 2μX + μ2]
= E(X2) – E(2μX) + E(μ2)
= E(X2) – Σ2μx.p(x) + Σμ2.p(x) where summation is over all x  D
= E(X2) - 2μΣx.p(x) + μ2Σp(x) since μ is a parameter of the distribution it is a
constant

= E(X2) - 2μE(X) + μE(X)

= E(X2) – μ2
= E(X2) – [E(X)]2
Variance of X is, therefore, equal to the expected value of the square of X minus the square of
the expected value of X. It is often easier to compute V(X) using this formula than the
definitional formula E[(X – μ)2].
Example 4.2
Given the pmf of X in example 2.2, we can compute the mean, variance and standard
deviation of the probability distribution of the random variable X.
x p(x) x.p(x) x2 x2.p(x)
4 0.5 2 16 8
6 0.3 1.8 36 10.8
8 0.2 1.6 64 12.8
E(X) = Σx.p(x) = 2 + 1.8 + 1.6 = 5.4

V(X) = E(X2) – [E(X)]2 where E(X2) = Σx2.p(x) = 8 + 10.8 + 12.8 = 31.6
 V(X) = 31.6 – (5.4)2 = 31.6 – 29.16 = 2.44 and σX = 2.44 = 1.562
5 VARIANCE OF A FUNCTION OF A DISCRETE RANDOM VARIABLE

Let h(x) denote a function of the random variable X. By definition 2,
V[h(x)] = E[h(x) – E(h(x))]2 = [h( x)  E(h( x))] . px

D
2
If we have a linear function h(x) = aX + b, then
V[h(x)] = [(ax  b)  E(ax  b)] . px

D
2
Now, E(aX + b) = a.E(X) + b

If we denote E(X) = μ, then E(aX + b) = aμ + b, so that
V[h(x)] = [(ax  b)  (a  b)] . px

D
2

= [a( x   )] . px
D
2
= a
2
 [ x   ] . p x 
D
2
= a 2 E[ X   ]2
= a 2V ( X )
Thus we get a simple relationship between V[h(x)] and V(X) for the linear function
h(x) = aX + b
Proposition 5
V (aX  b)   aX
2
 b  a  X and  aX b  a . X
2 2
We need to take the absolute value |a| since a 2 = ± a, and standard deviation can never be
negative.
Proposition 5 yields the following two rules of variance and standard deviation for the
function of a random variable.
Rule 4
V (aX )   aX
2
 a 2V ( X ) and  aX  a . X when b= 0 in h(x) = aX + b
Rule 5
V ( X  b)   X2 b  V ( X ) and  X b   X when a= 1 in h(x) = aX + b
Thus change in origin by adding or subtracting a constant b does not affect the variability of
the distribution. It just shifts the distribution to the left for b < 0, and to the right for b > 0.
However, change in the unit of measurement by multiplication or division by a constant a
impacts the variability. The new standard deviation is a product of the old standard deviation
and the absolute value of the conversion factor a. If 0 < |a| < 1 the distribution becomes
narrower. If |a| > 1, the new distribution is more spread out than before.

Exercise 8
The total cost for the production process is equal to $1000 plus two times the number of units
produced. The mean and variance for the number of units produced are 500 and 900
respectively. Find the mean and standard deviation of the total cost.
Solution
Let X denote the number of units produced where X is a rv. Then the cost function is
h(x) = 1000 +2X.
Given that E(X) = 500 and V(X) = 900,
E[h(x)] = E[1000 + 2X] = 1000 + 2E(X) = 1000 + (2)(500) = $2000
and
V[h(x)] = V(1000 + 2X) = 2V(X) = (4)(900) = 3600
so that  h( x )  3600 = $60
6 COVARIANCE AND VARIANCE OF SUMS OF RANDOM VARIABLES

We have seen in example 4.1 that, using proposition 4, the expected value of a sum of
random variables is the sum of their expected values, ie,
E(X + Y + Z) = E(X) + E(Y) + E(Z).
However, the variance of a sum of random variables is generally not equal to the sum of the
individual variances.
Example 6.1
V(X + X) = V(2X) = 4 V(X) whereas V(X) + V(X) = 2 V(X)
so that [V(X + X)] ≠ [V(X) + V(X)]
If the random variables are, however, independent then variance of a sum of the rv’s will
equal the sum of the respective variances. Recall that if X and Y are two independent rv’s
then the probability of occurrence of one variable is not affected by the probability of
occurrence of the other. To prove that V(X+Y) = V(X) + V(Y) when X and Y are
independent we need to first define the concept of covariance of two random variables.
Definition 3

The covariance of two random variables X and Y is defined by

Cov(X,Y) = E[(X - μX)(Y - μY)], where μX and μY are the means of X and Y respectively.
From the definition of covariance we can derive an alternative formula.

Cov(X,Y) = E[(X - μX)(Y - μY)]
= E[XY – μXY - μYX + μX μY]
= E(XY) – μXE(Y) - μYE(X) + μX μY
Since E(X) =μX and E(Y) = μY are constants of the probability distributions of X and Y
respectively,
Cov(X,Y) = E(XY) - μX μY - μX μY + μX μY
= E(EY) - μX μY
= E(X,Y) – E(X).E(Y)
If X and Y are independent rv’s then Cov(X,Y) = 0. This is because variability in X is

unrelated to variability in Y. To show that Cov(X,Y) = 0 for independent rv’s X and Y, we
need to prove that E(XY) = E(X).E(Y)
Proof
Let the rv X have a pmf p(x). Let the domain of X be denoted by D = { x1, x2,……., xm}
Let the rv Y have a pmf p(y). Let the domain of Y be denoted by D* = { y1, y2,……., yn}
E ( XY )   x y i j p( xi y j )
xiD y j D
Since X and Y are independent, p(xy) = p(x).p(y), therefore

E ( XY )   x y
xiD y j D
i j p( xi ) p( y j )    x p( x ). y
xiD y j D
i i j p( y j )
  x p( x ).E(Y )  E( X ).E(Y )
xiD
i i
Thus, Cov(X,Y) = E(XY) – E(X).E(Y)

= E(X).E(Y) – E(X).E(Y) = 0, if X and Y are independent.
Now let X and Y be two independent random variables. Then,

V(X + Y) = E[(X + Y) – E(X + Y)]2
= E[(X + Y) –{E(X) + E(Y)}]2

= E[{X – E(X)} + {Y – E(Y)}]2

= E[X – E(X)]2 + E[Y – E(Y)]2 + 2 E[{X – E(X)}{ Y – E(Y)}]
= V(X) + V(Y) + 2 Cov(X,Y)
Since Cov(X,Y) = 0 when X and Y are independent random variables, therefore,
V(X + Y) = V(X) + V(Y)
From this it follows that if X1, X2,…….. are independent random variables, then
V(X1+ X2 +,……..) = ΣV(Xi) if each pair of Xi, Xj are mutually independent (i≠j).
Exercise 9
A company produces and sells security devices in two countries which do not permit
international trade in this item. Let X and Y denote the number of devices sold weekly in the
first country and the second country respectively. The profit function in the two countries are
h(x) = 200X - 100 and h(y) = 500Y- 250
Compute the mean and standard deviation of weekly total profits (measured in $) of the
company if the pmf’s of X and Y are as follows:
x 3 4 5 6 y 1 2 3 4
p(x) 0.1 0.2 0.3 0.4 and p(y) 0.2 0.4 0.3 0.1
Solution
E(X) = 3(0.1) + 4(0.2) + 5(0.3) + 6(0.4) = 0.3 + 0.8 + 1.5 + 2.4 = 5
E(X2) = 9(0.1) + 16(0.2) + 25(0.3) + 36 (0.4) = 0.9 + 3.2 + 7.5 + 14.4 = 26
V(X) = 26 – 25 = 1
E(Y) = 1(0.2) + 2(0.4) + 3(0.3) + 4(0.1) = 0.2 + 0.8 + 0.9 + 0.4 = 2.3
E(Y2) = 1(0.2) + 4(0.4) + 9(0.3) + 16(0.1) = 0.2 + 1.6 + 2.7 + 1.6 = 6.1
V(Y) = 6.1 – 5.29 = 0.81
Expected profits in the first country = E[h(x)] = E[200X - 100]
= 200.E(X) -100
= 200(5) - 100 = $900
Variance of profits in first country = V[200X -100] = 40000 V(X) = 40000(1) = 40,000
Expected profits in the second country = E[h(y)] = E[500Y – 250]
= 500.E(Y) – 250

= 500(2.3) – 250 = $900

Variance of profits in second country = V[500Y – 250] = 250000 V(Y)
= 250000(0.81) = 202,500
Expected total profits = E[h(x)] + E[h(y)] = 900 + 900 = $1,800
Variance of total profits = V[h(x)] + V[h(y)] since profits in the two countries are
independent as there is no trade in this device in either country so that Cov(X,Y) = 0
V[h(x)] + V[h(y)] = 40000 + 202500 = 242500
Therefore standard deviation of total profits = 242500 = $492.44
Exercise 10
Show that Cov(aX+b,cY+d) = acCov(X,Y)
Solution
Cov(aX+b,cY+d) = E[{aX + b – E(aX + b)}{cY + d – E(cY + d)}]
= E[{aX + b – aE(X) +b}{cY + d – cE(Y) + d}]
= E[(a{X – E(X)})(c{Y- E(Y)})]
= acE[{X – E(X)}{Y – E(Y)}] = acCov(X,Y)
7 PARAMETERS OF THE PROBABILITY MASS FUNCTION

When the pmf specifies a mathematical model for the distribution of population values, the
expected value or mean μ measures the value of the rv at which the distribution is centered.
Both σ2 and σ measure the spread of the population distribution where σ2 is the population
variance and σ is the population standard deviation.
If most of the population values are close to μ, the spread of the distribution is small and σ2 is
relatively small. If, however, there are x values that are far from μ that have large p(x), then
σ2 will be quite large.
Example 7.1
In example 5.1, p(x) = ½ for x = -1, 1 and p(y) = ½ for y = -100, 100
E(X) = μX = ½(-1) + ½(1) = 0,
and E(Y) = μY = ½(-100) + ½(100) = 0,

so that μX = μY = 0
V(X) = E(X2) – [E(X)]2 = [½ (1) + ½(1)] – 0 = 1, and
V(Y) = E(Y2) – [E(Y)]2 = [½ (10000) + ½(10000)] – 0 = 10,000
Therefore, V(Y) > V(X)
The characteristics of the distribution can now be specified. We can obtain the calculated
values of the mean and variance. The histrogram of the distribution will show whether the
distribution is symmetric or asymmetric. It will also show whether the distribution is
unimodal, bimodal or multomodal.
Practice Questions
1 12 48 64
1 If X takes on the values 0, 1, 2, and 3 with probabilities , , , and
125 125 125 125
respectively, find E(X) and V(X). Use these results to find the mean and variance of
Y = 3X + 2
2. The pmf of the amount of memory X(GB) in a flash drive is given as follows:
x 1 2 4 8 16
p(x) 0.05 0.10 0.35 0.40 0.10
Compute the following

(a) E(X) (b) V(X) using the definitional formula
(c) Standard deviation of X (d) V(X) using the shortcut formula
3. Use the proposition involving V(aX +b) to establish a general relationship between
V(X) and V(-X).
4. If X denotes a temperature recorded in degrees Fahrenheit, then

5
 X  32 is the
9
corresponding temperature in degrees Celsius. If the standard deviation for a set of
temperatures is 15.7oF, what is the standard deviation of the equivalent Celsius
temperatures?

5. If E(W) = μ and V(W) = σ2, show that
W    W   
E   0 and V   1
     
6. Suppose that Xi is a rv for which E(Xi) = μ, I = 1, 2,……,n. Under what conditions

will the following be true?
 n 
E   ai X i   
 i 1 
7. A stationery shop orders copies of a certain magazines each week. Let X = demand
for the magazine, with pmf
x 1 2 3 4 5 6
p(x) 1/15 2/15 3/15 4/15 3/15 2/15
Suppose the shop actually pays 5 for each copy of the magazine and the price to
customers is 10. If magazines left at the end of the week have no salvage value, is it
better to order three or four copies of the magazine?
8. An industrialist has a choice of two alternative proposals to start a new project.

Proposal A will yield a profit of 10 lakhs with a probability of 0.4 or a loss of
2 lakhs with a probability of 0.6. Proposal B will yield a profit of 4.5 lakhs with
probability 0.8 or a loss of 50,000 with probability 0.2. Which proposal should he
prefer?
9. Given that variables X and Y are independent and Z = aX – bY, prove that
Var(Z) = a2Var(X) + b2Var(Y)
10. Arun and Barun play a game in which they toss a fair coin three times. The one
obtaining heads first wins the game. If Arun tosses the coin first and if the total value
of the stakes is 20, how much should be contributed by each in order that the game
be considered fair?

11. A bakery sells bread for Rs. 15 each. Daily sales X is a random variable and has a
distribution with mean 530 and standard deviation 69
(i) Find the mean daily total revenues from the sale of bread
(ii) Find the standard deviation of total revenues from the sale of bread
(iii) If daily costs (in Rs) for making bread are given by C=1000+0.95X, find the
mean and variance of daily profits from sales of bread
12. A chemical supply company currently has in stock 100 kg of a certain compound,
which it sells to customers in 5-kg batches. Let X = the number of batches ordered by
a randomly chosen customer, and suppose that X has pmf
x 1 2 3 4
p(x) 0.2 0.4 0.3 0.1
Compute E(X) and V(X). Then compute the expected number of kgs left after the
customer’s order is shipped and the variance of the number of kgs left.
13. A sports promoter is contemplating buying a rain insurance for an event he is

sponsoring. If it does not rain he expects to earn 10,000 but only 2,000 if it does.
If the probability of rain is 3/7, what is his expected earnings? If the insurance policy
costs him 3,000 and assures him 7,000 if it rains, is it profitable to purchase the
insurance?
14. Use the proposition involving V(aX +b) to establish a general relationship between
V(X) and V(-X)

Mathematical expectation continuous
DC-1
Semester-II
Lesson: Mathematical expectation continuous

TABLE OF CONTENTS

1. Expected value of a continuous random variable 2
2. Expectation of a function of a continuous random variable 4
3. Variance of a continuous random variable 8
4. Variance of a function of a continuous random variable 10
5. Rules of mathematical expectation 12
6. Expectation and variance of sums of continuous random variables 14
7. Characteristics of the probability density function 16
Content Developer
Reference
MATHEMATICAL EXPECTATION: CONTINUOUS RANDOM VARIABLES

In this chapter you will learn how to obtain two main characteristics of the
probability distribution of a continuous random variable. You will learn how to derive
the mean and variance of distributions of continuous random variables. You will also
learn how to apply the rules of mathematical expectation to functions of random
variables as well as to sums of random variables. The mean, variance, median and
mode, and coefficients of skewness and kurtosis will help you to identify the
characteristics and shape of the distribution.
Chapter Outline
1. Expected value of a continuous random variable
2. Expectation of a function of a continuous random variable
3. Variance of a continuous random variable
4. Variance of a function of a continuous random variable
5. Rules of mathematical expectation
6. Expectation and variance of sums of continuous random variables
7. Characteristics of the probability density function
1 EXPECTED VALUE OF A CONTINUOUS RANDOM VARIABLE

The mean of a distribution is the point on the number line where the distribution is
centered. The mean of the distribution of a continuous random variable (probability
density function or pdf) is its expected value. Expected value of a continuous random
variable is obtained as a weighted average of the values of the rv where the
probability densities are the weights. For discrete random variables method of
summation was used. In case of continuous random variables, expected value is
obtained by method of integration.
Definition 1
The expected value or mean value of a continuous random variable X with probability
density function f(x) is


 X  E ( X )   x. f ( x)dx

When the pdf f(x) specifies a model for the distribution of X values in a numerical
population, then μX is the population mean.
Example 1.1
If a contractor’s profits on a construction job can be looked upon as a continuous rv
having the pdf
1
18 ( x  1) 1  x  5
f ( x)  
0 otherwise

where the units are $1000, her expected profit is

1
E( X )   x. 18 ( x  1)dx

5
1
  ( x 2  x)dx
18 1
5
1  x3 x2 
   
18  3 2  1
1  125 25    1 1 
     
18  3 2   3 2 
1 126 24 
  
18  3 2
42  12 54
  3
18 18
Therefore, expected profit is $3,000
Exercise 1
The tread wear (in thousands of kilometers) that car owners get with a certain kind of
tyre is a rv X whose pdf is given by
 1  30x
 e x0
f ( x)   30
0 x0


What tread wear can a car owner expect to get with one of the tyres?
Solution

1  30x
E ( X )   x. e dx

30
 x
1 
 
30 0
x.e 30 dx
Integrating by parts,
1     30x  
 x 

30  0
E( X )   x e dx   1 e dx 
30
0 0  
 x 

x 

1  e 30 e 30 
 x 1   1 dx 
30   30 0
 30 
 0 
x 
 x 
0   e
1  e 30
 30
dx   1
30 0 30
0
 1
 0    1   30
 30 
Therefore, average tread wear a car owner can expect to get is 30,000 km.
2 EXPECTED VALUE OF A FUNCTION OF A CONTINUOUS RANDOM

VARIABLE
If X is a continuous rv with probability density function f(x), then any function of X,
h(X), will also have the pdf f(x).
Definition 2
If X is a continuous random variable with pdf f(x) and h(X) is any function of X, then
 
E[h( X )]  h ( X )   h( x) f ( x)dx provided that

 h( x) f ( x)dx  

Proposition 1
If h(X) is a linear function such as h(X) = aX + b, then E[h(X)] = aE(X) + b
Proof:

E[h(X)] = E[aX + b] =  ax  b f ( x)dx


 
=  axf ( x)dx   bf ( x)dx

 
 

= a xf ( x)dx  b
 
 f ( x)dx
= aE(X) + b
Example 2.1
If the pdf of X is given by
2(1  x) 0  x 1
f ( x)  
0 otherwise
and h( X )  2 X  1
Then,

E[h( X )]   2 x  1.21  x dx

1
 2  (2 x  1  2 x 2  x)dx
0
1
 2  ( x  1  2 x 2 )dx
0
1
 x2 2x3 
 2  x  
2 3 0
 1 2 
 2   1    0
 2 3 
3 2
 2  
2 3
 4 5
 3     1.67
 3 3
Exercise 2
An ecologist wishes to mark off a circular sampling region having radius 10 m.
However, the radius of the resulting region is actually a random variable R with pdf

3

 4 1  10  r 
2
 9  r  11
f (r )  
0 otherwise

What is the expected area of the resulting circular region?
Solution
Since area of a circle is h(R) = πR2, therefore expected area of the resulting circle is
 
11
E[h(R)] = E  R 2  =
22 22
 
E R 2 where E R   r f r  dr
2 2
7  7 9
Now,
 
11
3 2
E(R 2 )  
49
r 1  100  20r  r 2 dr
 
11
3
   99r 2  20r 3  r 4 dr
49
11
3  99r 3 20r 4 r 5 
    
4 3 4 5 9
3
   1 
 33 9 3  113  5 114  9 4  9 5  115 
4
  
5 
3 
 33729  1331  514641  6561  59049  161051
1
4 5 
3 
33 602   58080    102002 
1
 
4 5 
  19866  40400  20400.4
3
4
 133.6   100.2
3
4
Since area of a circle is h(R) = πR2, therefore expected area of the resulting circle is
E[h(R)] = E  22 R 2  =
 7 
22
7
 
E R2 
22
7
100.2  314.9143 sq.m.
Exercise 3
The weekly demand for propane gas (in 1000s of gallons) from a particular facility is
an rv X with pdf

  1
 21  2  1 x  2

f ( x)   
x 
0 otherwise


(a) Compute E(X)
(b) If 1.5 thousand gallons are in stock at the beginning of the week, how much of
the 1.5 thousand gallons is expected to be left at the end of the week?
Solution
(a)
 1 
2
E  X    x.21  2  dx
1  x 
2
2
 1  x2 
 2   x   dx  2   ln  x 
1
x 2 1
 1 
 2 2  0.693    0   1.614
 2 
= 1,614 gallons
(b) Amount in stock is 1.5 thousand gallons out of which the demand is a random
variable X thousand gallons.
Amount left = h(x) = max{(1.5 - x), 0] thousand gallons
2
E h x    max 1.5  x ,0f ( x) dx
1
 1 
1 .5
  1.5  x .21  x
1
2 

dx
 1. 5 1 
1 .5
 2  1.5  x  2   dx
1 x x
1.5
 x 2 1.5 
 2 1.5 x    ln x 
 2 x 1

 2 1.5 
2 1.5 1.5
2

  1 
 0.4055  1.5   1.5  0
 2 1.5   2 
 22.5305  2.5  0.061
Therefore, the expected amount left in stock at the end of the week is 61 gallons.

(Since weekly demand for propane can vary between 1000 and 2000 gallons for
1 < x < 2, and amount in stock is 1.5 thousand gallons, amount left at the end of the
week can vary between the minimum of 0 if demand x = 1.5 or more, and maximum
of 500 gallons if demand is x = 1.)
3 VARIANCE OF A CONTINUOUS RANDOM VARIABLE

The variance and standard deviation reflect the spread or dispersion of the probability
distribution or the population of x values. Two distributions with the same expected
value can be distinguished as two distinct distributions if they have different values of
variance.
Definition 3
The variance of a continuous random variable X with pdf f(x) and mean value μ is

V ( X )   X2  E[ X   ]2   x    f x dx
2

and standard deviation of X is  X  V (X )
It can be shown that for continuous random variables, just like in the case of discrete
random variables,
V(X) = E(X2) –[E(X)]2
Proof

   x    f x  dx
2 2
X


  x  2x   2 f  x  dx
2

  
  x f x dx  2  xf x  dx    f x  dx
2 2
  

 E X 2   2 E  X    2 sin ce  f x  dx  1

 E X  2 2
sin ce E  X   
 E X   E  X 
2 2

Example 3.1
In exercise 3, the pdf of X is given as
  1
 21  2  1 x  2

f ( x)   
x 
0 otherwise


V(X) = E(X2) – [E(X)]2 , where E(X) = 1.614
 1 
1
x 21  2  dx
2 2
E(X ) =
 x 
2
 2  ( x 2  1) dx
1
2
 x3 
 2  x
3 1
 8   1 
 2   2     1
 3   3 
2 2 8
 2      2.667
3 3 3
V(X) = 2.667 – (1.614)2 = 2.667 – 2.605 = 0.062
and σx = 0.249
= 249 gallons.
Exercise 4
For what value of k does V(Y) =2 when pdf of Y is given as
2y
 2 0 yk
f ( y)   k
0 otherwise

Solution
k
2y
E (Y )   y dy
0 k2
k
2  y3  2
 2   k
k  3 0 3

k
2y
E (Y 2 )   y 2 dy
0 k2
k
2  y4  1
 2    k2
k  4 0 2
1 4 1
V (Y )  k 2  k 2  k 2  
2 9 18 
k2
Given that V (Y )   2,
18
k 2  36.
Therefore, k = 6
The variance is the second moment about the mean, ie the second central moment. It
is obtained by taking the difference of the second moment about the origin {ie,E(X2)}
and the square of the first moment about the origin {ie, [E(X)]2}
4 VARIANCE OF A FUNCTION OF A CONTINUOUS RANDOM

VARIABLE
Variance of a function of the random variable, h(X), is obtained by substituting h(X)
in place of X in the definitional formula for variance of X. Thus,
V[h(X)] = E[h(X) –E{h(X)}]2
= E[h(X)]2 – [E{h(X)}]2
where

Eh X    hx  f x  dx

and Eh X  2
 hx  f x  dx
2

Proposition 2
If h(X) is a linear function such as h(X) = aX + b, then V[h(X)] = a2V(X)
Proof:
When h(X) = aX + b, ie, a linear function of the rv,
E[h(X)] = E[aX + b]

= a E(X) + b
= aμ + b
and
 
V [h( X )]    ax  b  f  x  dx   a  b 
2 2
  

  a x 2  2abx  b 2 f x  dx  a  b 
2 2

  
 a 2  x 2 f  x  dx  2ab  xf  x  dx  b 2  f x  dx  a  b
2
  
 a E X
2
  2ab  b  a 
2 2 2 2
 2ab  b 2
 a E X    
2 2 2
 a E X   [ E  X ] 
2 2 2
 a 2V ( X )
Example 4.1
In example 1.1 the contractor’s profits X (in thousand dollars) was a continuous rv
with pdf
 1 ( x  1) 1  x  5
f ( x)  18
0 otherwise
Expected profits = E(X) = 3 = $3000
V ( X )  E ( X 2 )  [ E ( X )]2
5 2 1  1 5 3 
   x . ( x  1) dx   9    x  x 2 dx   9  
1 18  18 1 
5
1  x4 x3 
    9
18  4 3  1
1  625 125   1 1 
      9  156  42  9  2
1
  
18  4 3   4 3  18
and  X  2  1.41421  $1414.21
Exercise 5
Let Y have the pdf

31  y  2
0  y 1
f ( y)  
 0 otherwise
Find the variance of W, where W = 12 – 5Y

Solution
V(W) = 25V(Y)
1
E (Y )   y.41  y  dy
2
 
1
 4 y  2 y 2  y 3 dy
0
1
 y2 y3 y4 
 4  2  
2 3 4 0
1 2 1  1
 4    
2 3 4 3
1
E (Y )   y 2 .41  y  dy
2 2
 
1
 4 y 2  2 y 3  y 4 dy
0
1
 y3 y4 y5 
 4  2  
 3 4 5 0
1 2 1  2
 4    
3 4 5 15
2 1 1
V (Y )   
15 9 45
1 5
V (W )  (25) 
45 9
5 RULES OF MATHEMATICAL EXPECTATION

Let us consider a linear function of the continuous variable X, h(X) = aX + b,
where the pdf of X is f(x) so that the probability distribution of h(X) is f(x)
In section 2 it has been proved that E[h(X)] = aE(X) + b.

This gives us the following three rules of mathematical expectation

Rule 1
If b = 0, then for any constant a, E(aX) = aE(X)
The rule says that expected value in the new units equals the expected value in the old
units multiplied by the factor a. Multiplication of X by the constant a changes the unit
of measurement
Rule 2
If a = 1, then for any constant b, E(X + b) = E(X) + b
If a constant is added to each possible value of X, there is a change in origin. Then the
expected value will be shifted by the same amount b to the right or left depending on
whether b is greater than or less than zero respectively.
Rule 3
 
If a = 0, then for any constant b, E(b) =  b. f ( x) dx = b  f ( x) dx = b
 
That is, the expected value of a constant is just its value.
In section 4 we have seen that for the linear function h(X) = aX + b.

V[h(X)] = a2V(X)
This gives us the following three rules for variance
Rule 4
V (aX )   aX
2
 a 2V ( X ) and  aX  a . X when b= 0 in h(X) = aX + b
A change in the unit of measurement by multiplication or division by a constant a

impacts the variability. The new standard deviation is a product of the old standard
deviation and the absolute value of the conversion factor a
Rule 5
V ( X  b)   X2 b  V ( X ) and  X b   X when a= 1 in h(X) = aX + b
A change in origin by adding or subtracting a constant b does not affect the variability
of the distribution

Rule 6
V (b)  0 and  X b  0 when a= 0 in h(X) = aX + b
Thus the same rules of mathematical expectation for calculating the mean and
variance apply for distributions of both discrete and continuous random variables.
6 EXPECTATION AND VARIANCE OF SUMS OF CONTINUOUS

RANDOM VARIABLES
Let us consider the sum of two continuous random variables X and Y, where f(x) and
f(y) are the respective probability density functions. Then, g(X,Y) = X + Y
Proposition 3
Whether or not the random variables X and Y are independent, E(X+Y) = E(X) + E(Y)
Propositions 1 extends this result to sums of linear functions of random variables.

Let g(X) = aX + b, and h(Y) = cY + d, then
E[g(X) + h(Y)] = E[aX + b + cY + d]
= [aE(X) +b] + [cE(Y) + d]
= E[h(X)] + E[g(Y)]
This result can be extended to more than two linear functions of continuous random
variables. For example, if there are linear functions of X, Y and Z, the expected value
of the sum of the functions is the sum of the expected value of the functions.
Proposition 4
If the random variables X and Y are independent, V(X+Y) = V(X) + V(Y)
Proposition 2 extends this result to sums of linear functions of independent random

variables.
If g(X) = aX + b, and h(Y) = cY + d, then
V[g(X) + h(Y)] = V[aX + b + cY + d]
= [a2V(X) +0] + [c2V(Y) + 0]
= a2V(X) + c2V(Y)

= V[g(X)] + V[h(Y)]
and
σg(X)+h(Y) = V [ g ( X )  V [h(Y )]
This result can be similarly extended to sums of linear functions of three or more than
three independent continuous random variables. We see that the methods for
obtaining the expectation and variance of sums of linear functions of continuous
random variables are the same as that for discrete variables.
Example 6.1
The independent random variables X and Y have the following density functions:
x
 0 x2
f ( x)   2
0 otherwise

2(1  y ) 0  y 1
and f ( y)  
 0 otherwise
Let g(X) = 3X -1 and h(Y) = 2Y + 1.
Then,
E[g(X) + h(Y)] = E[g(X)] + E[h(Y)]
= [3E(X) – 1] + [2E(Y) + 1]
2
1  x3 
2 2
x 1 2 4
E ( X )   x. dx   x dx    
0
2 20 2  3 0 3
Eg  X   3  1  3
4
3
1
1 1
 y2 y3  1 1 1
E (Y )   y.2(1  y ) dy  2 ( y  y ) dy  2    2   
2
0 0 2 3 0  2 3 3
EhY   2  1 
1 5
3 3
Eg  X   h(Y )  3 
5 14
  4.667
3 3
Also
V[g(X) + h(Y)] = V[g(X)] + V[h(Y)]
Now,
V[g(X)] = 9V(X) = 9[E(X2) – (4/3)2]

and
V[h(Y)] = 4V(Y) = 4[E(Y2) – (1/3)2}
2
1  x4 
2 2
x 1 3
E ( X )   x . dx   x dx     2
2 2
0
2 20 2  4 0
 16 
V g  X   92    2
 9
1
 y3 y4 
 
1 1
1
E (Y )   y .2(1  y ) dy  2 y  y dy  2   
2 2 2 3
0 0 3 4 0 6
1 1 2
V hY   4   
6 9 9
V g ( X )  h(Y )  2 
2 20
  2.222
9 9
7 CHARACTERISTICS OF THE PROBABILITY DENSITY FUNCTION

The probability distribution of a continuous rv is its pdf. Any distribution can be
characterized by its mean and variance. The expected value of a rv is called its mean
value. To compute the population average value of X we need only the possible values
of X along with their respective probabilities. The size of the population is immaterial
as long as the pdf is given. The mean of the distribution is the point on the number
line where the graph of the distribution is centered.
Variance or standard deviation of a distribution measures the spread of the

distribution. A smaller value of variance indicates that the distribution is clustered
closer to its mean value, whereas a larger value of variance means that the distribution
is more spread out. Two distributions having the same mean value but different
variances will be two distinct distributions. Similarly, graphs of two distributions with
the same variance but different means will have the same spread but centered at
different points on the number line.
For a unimodal distribution, by equating the first derivative of the pdf to zero, such
that the second derivative is negative, gives us the modal value. The mode is the value
of the rv at which the graph of the distribution reaches its highest point. The median

is obtained by computing the 50th percentile, so that half the distribution is on either
side of the median.
Note that the mean, median, mode and standard deviation are all expressed in units of
measurement of the rv. If the units are changed by a multiplication factor a then the
values of all these measures will also be affected accordingly.
A third characteristic of the distribution is the skewness of the distribution. The

distribution may be symmetric or it may be skewed. Just as the first moment E(X)
gives us the mean of the distribution and the second moment E(X2) is used to find the
measure of variance, the third moment E(X3) is used to calculate the measure of
skewness. However, a simple inspection of any two of the measures mean, median
and mode helps us to identify the shape of the distribution. For a symmetric
distribution, mean = median = mode. If the distribution is positively skewed, then
mean > median > mode. For a negatively skewed distribution mean < median < mode
mean  median
Skewness can be measured by the formula . This measure will
s tan dard deviation
be zero if the distribution is symmetric, positive for a positively skewed distribution
and negative for the negatively skewed distribution.
The moment measure of skewness requires the third moment E(X3). The moment
coefficient of skewness is independent of units of measurement. The formula for the
EX   
3
moment coefficient of skewness is , where the numerator is the third
3
central moment. This may be expressed in terms of the moments about the origin.
E(X-μ)3 = E(X3) – 3 E(X2) + 2[E(X)]3. Here too the coefficient will be greater than,
less than, or equal to zero if the distribution is positively or negatively skewed or
symmetric, respectively. The normal distribution is symmetric.
A fourth characteristic of the distribution is a measure of its peakedness or the degree

of flatness near its center. This is measured by the coefficient of kurtosis and is based
on the fourth moment of the distribution. The coefficient of kurtosis,
E[ X  E ( X )]4 E ( X 4 )  4 E ( X 3 )  6 E ( X 2 )  3[ E ( X )]4
β2 = 
E[ X  E ( X )]2 4

4 th central moment
= [since deviations are taken about E(X) = μ]
2 nd
central moment 
2
The coefficient of kurtosis is 3 for the normal distribution. It is less than 3 for a
density function that is flatter than the normal distribution, with short fat tails. If the
density function is more peaked than a normal distribution, with long tails, then the
coefficient is greater than 3.
The following figure illustrates the three types of distributions. For the purpose of
comparison of kurtosis, only symmetric distributions are shown.
Figure 1: Peakedness of a probability density function: Kurtosis
PRACTICE QUESTIONS
1. Scores earned by students in an economics test (X) is a rv with the pdf

 1
 100  x  0  x  100
f ( x)   5000
 0 otherwise

The professor announces that to improve the results he will replace each
student’s marks, X, with new scores, Y= X . Has the professor’s strategy been
successful in raising the class average above 60?

2. A box is to be constructed so that its height is 5 inches and its base is Y inches
by Y inches, where Y is a random variable described by the pdf given below.
Find the expected volume of the box.
6 y(1  y) 0  y 1
f ( y)  
 0 otherwise
3. A tool manufacturing company makes steel gauges. Their annual profit, Q, in

hundreds of thousands of rupees, can be expressed as a function of product
demand, y:
Q(y) = 2(1- e-2y)
Suppose that the demand (in thousands) for their product follows an
exponential pdf
6e 6 y y0
f ( y)  
 0 otherwise
Find the company’s expected profit
4. Suppose that X is an exponential rv whose pdf is given by

e   y y0
f ( x)  
 0 otherwise
Show that the variance of X is 1/λ2
5. The density function of X is given by

a  bx 2 0  x 1
f(x) =  .
0 otherwise
3
If E(X) = , find a and b
5
6. The median of the distribution of the rv X, described by the following pdf, is

3.123
 18 x  1 2 x4
f(x) = 
0 otherwise
Is the distribution symmetric? Give reasons for your answer.

7. Suppose the amount of paint, Y, in a can of spray paint is a rv with pdf

3 y 2 0  y 1
f(y) = 
 0 otherwise
Experience has shown that the largest surface area that can be painted by a can
having Y amount of paint is twenty times the area that can be generated by a
radius of Y ft. Can a randomly selected can of spray paint be expected to cover
a wall of dimensions 5’ x 8’?
8. A rv X is described by the pdf

2 x 0  x 1
f(x) = 
0 otherwise
What is the standard deviation of 3X + 2?
9. If the pdf of the rv Y is

2y
 2 0 yk
f(y) = 
k
 0 otherwise

for what value of k does V(Y) = 2?
10. The coefficient of variation (σ/μ) is a measure of the spread of the distribution
that is independent of the unit of measurement of the rv. If the rv Y is
described by the following pdf,
3(1  y ) 2 0  y 1
f(y) = 
 0 otherwise
what is the coefficient of variation of X?

Theoretical Distributions: Discrete And Continuous
DC-1
Semester-II
Lesson: Theoretical Distibutions : Discrete and

Continuous
Lesson Developer: Ankur Bhatnagar
College/Department: Satyawati College, University of

Delhi

Contents
1. LEARNING OBJECTIVES
2 INTRODUCTION
2.1 DISCRETE DISTRIBUTIONS
2.1.1 BINOMIAL DISTRIBUTION
2.1.2 POISSON DISTRIBUTION
2.1.3 BINOMIAL APPROXIMATION TO POISSON
2.2 CONTINUOUS DISTRIBUTIONS

2.2.1 NORMAL DISTRIBUTION
2.2.2 STANDARD NORMAL VARIATE (z) IN TERMS OF PERCENTILES

2.2.3 BINOMIAL APPROXIMATION TO NORMAL
2.2.4 UNIFORM DISTRIBUTION
3. USEFUL LINKS
4. EXERCISES
1. LEARNING OBJECTIVES
We discuss 4 distributions here.
 Discrete distributions: Binomial, Poisson
 Continuous: Uniform, Normal
For each we need to be familiar with
 Probability mass function( for discrete distributions), probability distribution
function( for continuous distributions)
 Cumulative distribution function
 Conditions for a distribution to hold
 Approximations (as applicable)
 Mean ,variance and standard deviation
2. INTRODUCTION
What are theoretical distributions?

Theoretical distribution is based on mathematical formulae. They are derived from model or
estimated from data, rather than conducting experiments physically or making a sample
space. If certain conditions are fulfilled we can say that a variable follows a particular
distribution.
2.1 DISCRETE DISTRIBUTIONS
2.1.1 BINOMIAL DISTRIBUTION
For any variable x to follow a binomial distribution the following conditions must be met.
1. There are n fixed and identical trials.
2. Each trial is independent of other trials, so that outcome of one trial does not affect
the outcome of any other trial.
3. Each trial has ONLY two possible outcomes S (Success) and F (Failure).
4. P(Success) =P(S) is denoted as p and is constant in each trial. P(Failure) = q = 1 − p.
The pmf is given as the probability of r successes in n trials.
P(X=r) =n Cr*prqn−r=n!/(r!(n − r)!) prqn−r
Mean=E(x) = n*p
Variance= V(x) = n*p*q
TIP: Success and Failure are labelled in an arbitrary fashion. We can label any of the
two events in the sample space as success. For example a girl/boy child can be labelled
success or failure, without affecting the answer. However we need to be careful with
the value of r.
Q Assume that a die is tossed 5 times. What is the probability of getting exactly 2 fours?
We use this example to show that any of the two events in a binomial experiment can be
labelled as success.
Option 1:
Define getting a four to be a success so that p=1/6. We want P(r=2). Use n= 5,p= 1/6 and r= 2
to get b(2; 5, 0.167) = 5C2 * (0.167)2 * (0.833)3 = 0.161.
Option 2:
Now define any number except four to be a success so that p=5/6. We want 2 fours so that
the number of success = 5-2=3 We now want P(r=3). Use n= 5, p= 5/6 and r= 3 to get
P(r=3) = 5C3 * (0.833)3 * (0.161)2 = 0.161.
We have illustrated that ‘labelling’ of success and failure have no impact on the answer as
long as the number of success is chosen properly and correctly.

2.1.2 POISSON DISTRIBUTION
The Poisson distribution is a discrete probability distribution for the number of events that
occur randomly in a given interval of time/period. This is unlike a binomial, hypergeometric
or negative binomial distributions that are based on an experiment that uses trials/ draws to
get probability of various outcomes.
Let X = the number of events in a given interval
λ = mean number of events per interval
The probability of observing r events in a given interval is given by the pdf.
P(X=r) = e –λ λr/r! Where r takes values 0,1,2,3,4.... and e=2.718282
Mean= variance= λ
NOTE: the rate/number of events is always in terms of a specified interval like per hour or
minutes or days.
Q Historical data shows that there are 1.8 births per hour in a village. What is the
probability that 4 babies will be born in any given hour here?

Let X= number of births, we need P(X=4) = e –1.8 1.84/4! = 0.0723

What is the probability of having 2 or more births in an hour.
We need P(X ≥2). Since the value of number of births is infinite. We get probability
as
P(X ≥2)= 1-P(X=0) - P(X=1) = 1-e –1.81.80/0! - e –1.8 1.81/1! = 0.537
Q Let X equal the number of typos on a printed page with a mean of 3 typos per page.
a. What is the probability that a randomly selected page has at least one typo on it? We
can find the requested probability directly from the pdf. The probability that X is at
least one is: P(X ≥ 1) = 1 − P(X = 0) = 1- e –3 30/3! =1−0.0498=0.9502 That is, there
is just over a 95% chance of finding at least one typo on a randomly selected page
when the average number of typos per page is 3.
b. What is the probability that a randomly selected page has at most one typo on it?
The probability required is P(X ≤ 1) = P(X = 0) + P(X = 1)

= P(X≤1)= 1-e –3 30/0! + e –3 31/1! =0.1992.
That is, there is just under a 20% chance of finding at most one typo on a randomly
selected page when the average number of typos per page is 3.
Q. An office receives 20 faxed orders every two hours.

This is a Poisson distribution with average no of faxes= 10 faxes per hour.
a. What is the probability that it will receive 8 orders in the next hour?
P(x =8) = .1125
b. What is the probability that an order will be faxed within the next 9 minutes?
NOTE that the rate was in terms of hours whereas this question talks in minutes. So
we convert minutes to hours. 9 minutes= 9/60 hours
We need P(x < 9/60) which has an exponential distribution.
= 1- e (-(9/60)*10) = 1-e (-3/2) = .77687
c. What is the probability that more than 12 minutes will elapse between faxed orders?
12 minutes= 12/60 hours= .2 hours
We need P(x > .2) which has an exponential distribution.

= e (-(.2)*10) = e (-2) = .135335
2.1.3 BINOMIAL APPROXIMATION TO POISSON
A binomial distribution approximates a Poisson distribution when n approaches infinity

and p approaches 0. A simple thumb rule is that n >50 and np< 5. Let us see the
approximation below:
Q The publisher of a medical journal claims that probability of an error is .005. the
errors on each page are independent of each other. If a journal has 400 pages,
a. what is the probability that only 1 page has an error?
N=400 p= .005 so that np= 2.
Using a Poisson distribution, P(X=1) = e –221/1! = .270671
If we use a binomial then, the answer is .270669. The answers are close to each other,
proving that approximation holds.
b. what is the probability that at most 2 pages have an error?
P(X=0)+P(X=1) +P(X=3) = e –220/0! + e –221/1! + e –222/2!
= .135226+.270671+.270671= .676653
2.2 CONTINUOUS DISTRIBUTIONS

2.2.1 NORMAL DISTRIBUTION:
The probability distribution function of a normal variable x, is given as:
f(x) = { 1/[ σ √2π ] } *e-(x - μ)^2/2σ^2
μ is the mean, σ is the standard deviation, π is approximately 3.14159, and e is approximately
2.71828. Each normal distribution with its own values of μ and σ would need its own
calculation of the area under various points on the curve.
Using this function of this distribution is very cumbersome as it involves an exponential term.
We transform the x variable to another variable, named ‘z’ which is easier to use.
The transformation involves changing the scale and origin of x as follows:
Z=( x- mean of x)/ sdv of x= (x - μ) / σ
Z is also called the standard normal variable.
The mean of z= 0 and sd =1. Let us see how.(sd stands for standard deviation)

Note E(z)= E( ( x- mean of x)/ sd of x).

Using the rules of expectations,
E(z)= ( E(x) –E(x))/sd of x= 0 ( sdv of x and E(X) are constants)
V(z) = V((x - μ) / σ ) = V(x) / σ2= σ2 /σ2 = 1 (sd2(x) = σ2=V(x))
The z distribution is also bell shaped. The area under this curve provides the cumulative
probability of a given z value. Since we are dealing with a continuous variable, the
probability of z taking a single value is zero. The area under the curve and between a given
value of z gives us the probability that the variable X ( that corresponds to z chosen)takes a
value less than specified value
In short calculating probability for a normal distribution involves the following steps:
 Choose a value of X ( say x1) for which we need P(X < x1)
 Find the value of z that corresponds to x1, and call it z1= (x1=mean)/sd
 From the tables find the row that corresponds to z1 and get the associated value given
in table. This value is the probability of z < z1 or P(x <x1) .
 Now suppose we want P( X>x1) . we find this as 1-P( z<z1)
 Next we want P(x1<x<x2). We get z1= (x1-mean)/sd and z2=( x2-mean)/sd
 P( x1<X<x2)=P(z1<Z<z2) =P( z< z2) –P(z < z1)
 = table entry that corresponds to z2 – value that corresponds to z1
This is shown in the following diagram:
EMPIRICAL RULE:

The Empirical Rule is based on the above concept of z distribution. It states that if a data set
is normally distributed with population mean µ and standard deviation σ, then the following
are true:
 About 68% of the values lie within 1 standard deviation of the mean In statistical
notation, this is represented as μ ±σ
 About 95% of the values lie within 2 standard deviations of the mean .The
statistical notation for this is μ ±2σ
 About 99.7% of the values lie within 3 standard deviations of the mean or
between - μ ± 3σ.
Consider an example:
Q The Bulb Co, Ltd finds that its average CFL lasts 1000 hours with a standard
deviation of 100 hours. Assume that CFL life is normally distributed.
a. What is the probability that a randomly selected CFL will burn out in 1200 hours or
less?
Let x be the life of CFL in hours:
E(x) = 1000. standard deviation(x)=100.We want P( x <1200)

Let us transform x to z: Z=( 1200-1000)/100= 2
So P( x <1200)=P( z <2). Looking at normal tables we find the answer to be 0.977.

Thus, there is a 97.7% probability that a CFL will burn out within 1200 hours. The
diagram below shows the probability as rounded off figure. The blue area is the
required probability.
b. What is the probability that a randomly selected CFL will last more than 1200 hours?
We want P( x >1200).Let us transform x to z:
Z=( 1200-1000)/100= 2
So P( x >1200)=P( z >2 ). Looking at normal tables we find that P( z<2) =0.977.
Since area under the curve is 1 we find P(z>2) = 1-.977= 0.023. Thus, there is a 2.3%
probability that a CFL will last more than 1200 hours.
c. What is the probability that a randomly selected CFL will last between 1100 and 1200
hours?
P( 1100<x<1200) = P( 1< z <2)
When x= 1100, z =(1100-1200/100)= 1

Since the table gives us cumulative probabilities, we find that P(z <1100) = 0.841 and
P( z< 1200)= .977.
So P(1<z<2) = .977-.841 = 0.136.
Q The mean of a normal probability distribution is 60; the standard deviation is 5.
a. About what percent of the observations lie between 55 and 65?

P( 55<x <65) = p[ ( 55-60)/5 < z <(65-60)/5] = P(-1<z <1) = .6826 or 68.26%

b. About what percent of the observations lie between 50 and 70?
P( 50 < x < 60) = p[ ( 50-60)/5 < z <(60-60)/5] = P(-2<z <2) = .9545 or 95.45%
c. About what percent of the observations lie between 45 and 75?
P( 45<x<75) = p[ ( 45-60)/5 < z <(75-60)/5] = P(-3<z <3) = .9973 or 99.73%
Q The GMAT is used to enter USA for education. Assume that scores are based on a
normal distribution with a mean of 1500 and a standard deviation of 300. New York College
would like to offer a scholarship to students who score in the top 10 percent of this test. What
is the minimum score that qualifies for the scholarship?
We want x1 such that P(x >x1) = 0.1
P(z >z1) = .1
From the tables we can z1= 1.282
1.282 =(x1-1500)/300
X1 = 1884.6 is the minimum score that qualifies for the scholarship
Q Chemical Company claims that its chemical X contains on the average 4.0 fluid ml of
caustic materials per liter. It further states that the distribution of caustic materials per liter is
normal and has a standard deviation of 1.3 fluid ml. What proportion of the individual liter
containers for this product will contain more than 5.0 fluid ml of X?
P(x > 5) = P(z >(5-4)/1.3) =P(Z > .769)= 0.220947 or 22.0947% of the individual
liter containers for this product will contain more than 5.0 fluid ml of X.
Q The Federal Government is stepping up efforts to reduce average response times of

fire departments to fire calls. The distribution of mean response times to fire calls follows a
normal distribution with a mean of 12.8 minutes and a standard deviation of 3.7 minutes.
a. Find the probability that a randomly selected response time is less than 15 minutes.
P(x <15) = P(z < (15-12.8)/3.7) = P( z < 0.5946) =0.72394
b. Find the probability that a randomly selected response time is between 13 minutes and
15 minutes.
P(13<x <15) = P((13-12.8)/3.7 < z < (15-12.8)/3.7)) = P( -0.054 <z < 0.5946)
0.42934
c. The fastest 20% of fire departments will be singled out for a special safety award.
How fast must a fire department be in order to qualify for the special safety award?
We need z1 such that P( z >z1) = .2

Z1 = 0.842
X1= .842*3.7 +12.8 =15.9154
TIP: we need to understand the relation between x , z and area under normal
curve. Given any one of them we must be able to find the other two.
X Z AREA
bj
The z table can be understood like this. The first column gives the value of z, while the
other columns contain the area to the left of a given z value. From the values shown
below:
i. P( z < .4) = .6554
ii. P( z <.46)= .6772
iii. P( z > .4) = 1-P( z <.4) = 1-.6554= .3446
iv. P( .4< z< 1.1) = .8643 -.6554= .2089

2.2.2 STANDARD NORMAL VARIATE (z) IN TERMS OF PERCENTILES

We can interpret zα as the 100(1-α)th percentile of the standard normal variable. In this
sense, choosing α to be . 05 gives us the 95th percentile corresponds to a value of z (say
z1) such that P( z < z1) = .95. From the tables this value of z is 1.645.
So the area to the left of 1.645 is .95(or 95% ) while 5% area lies to the right of 1.645.
Take some examples:
α percentile zα value area to left of zα area to right of zα

0.25 75 0.67449 75% 25%
0.45 55 0.125661 55% 45%
0.36 64 0.358459 64% 36%
0.85 15 -1.03643 15% 85%
0.9 10 -1.28155 10% 90%
Note that due to symmetry of normal bell shaped curve, zα = - zα.

Note that for some special and more frequently used values of α, we refer to them as
‘critical’ value of z. These are provided below:
area to right of
α percentile zα value area to left of zα zα
0.1 90 1.281552 90% 10%
0.05 95 1.644854 95% 5%
0.025 97.5 1.959964 97.50% 2.50%
0.01 99 2.326348 99.00% 1.00%
0.005 99.5 2.575829 99.50% 0.50%
0.001 99.9 3.090232 99.90% 0.10%
0.0005 99.95 3.290527 99.95% 0.05%

We can now move to calculate percentiles for non standard distributions ( those that do
not have 0 mean and 1 sd). If a data set X, is normally distributed with population
mean µ and standard deviation σ, then
100(1-α)th percentile for X is given as μ +(100(1-α)th percentile for z distribution) *σ.
Take an example:
Q. The CAT exam is used to enter prestigious IIMs in India. Assume that scores are
based on a normal distribution with a mean of 1500 and a standard deviation of 300. IIM
Bangalore offers an interview to those who are in the top 3%, while IIM Guwahati offers an
interview to top 8%. How much do you need to score to guarantee an interview in both
places?
Let scores obtained in CAT be denoted by X.
For Bangalore: We want a minimum 97th percentile here to qualify
Using α = .03 we get zα as 1.88 according to the standard normal distribution (z
distribution).
Converting this to X distribution implies that 97th percentile for X is 1500
+1.88*300= 2064. So we need a score of 2064 to get into IIM Bangalore.
For Guwahati: We want a minimum 92th percentile here to qualify
Using α = .08 we get zα as 1.405 according to the standard normal distribution (z
distribution).
Converting this to X distribution implies that 92th percentile for X is 1500
+1.405*300= 1921.5. So we need a score of 1921.5 to get into IIM Guwahati.
2.2.3 BINOMIAL APPROXIMATION TO NORMAL

When the mean of a binomial distribution exceeds 5 we can approximate a binomial
distribution with a normal distribution where μ =n*p and σ =√npq.
Let X be a binomial random variable, based on n trails and p as probability of success.
Assuming that np>5,
P( X≤ x)= area under normal curve that lies between x-.5 and x+.5. consider an example:
Q Assume a binomial probability distribution with n = 40 and = 0.55.
a. The mean and standard deviation of the random variable.
Mean = np= 40*.55 =22

Std dev = 3.1464.

Since np >5 we can approximate binomial probabilities with normal probabilities.
b. The probability that X is 25 or greater
P( X > 25) = P( x > ( (25-22)/3.146) =.P( x > .95) = 0.17106
c. The probability that X is 15 or less.
P( X < 15) = P( x < ( (15-22)/3.146) =.P( x< -2.22) =0.01321
d. The probability that X is between 15 and 25, inclusive
P( 15< X < 25) = P( -2.22< z < .95) = 0.81519
Q. The probability of recovery from a rare TB virus in known to be 0.4. If 100 people
contract this virus, what is probability that less than 30 will survive?
Let the no of patients that survive be X. We want P( X<30). We can use binomial
distribution with n=100 and p= 0.4 and r=30. We can also approximate this toa
normal distribution where mean=100*.4=4 and variance= 100*.4*.6=24
We now want P(29.5<X<30.5) = P(z1< z<z2)
z1= (29.5-40)/√24 = -2.14
z2 = (30.5-40)/√24 = 2.14
P( -2.14<z<2.14) = 0.0162.
2.2.4 UNIFORM DISTRIBUTION
A uniform distribution, ( or a rectangular distribution), is a distribution that has constant

probability.
Pdf: f(x) = 0 for x<a
= 1/(b-a) for a<x<b

=0 for x >b
cdf: F(x) = 0 for x<a
= (x-a)/(b-a) for a<x<b
=0 for x >b
Mean= E(x) = (a+b)/2

Variance=V(x) = (b-a)2/12 .
Take an example:
Q Assume that the amount of petrol sold every day at a petrol pump is uniformly
distributed. The least amount sold is 1000 litres and maximum sold is 3000 litres.
a. What is probability of selling more than 2500 litres on a day?
= (3000-2500)/( 3000-1000)= 5000/2000=0.25 or 25%
b. What is probability of selling a maximum of 1200 litres on a day?
= (1200-1000)/( 3000-1000)= 200/2000=0.or 10%.
c. What is the probability that the pump will sell between 1500 and 2500 litres?
=( 2500-1500)/(3000-1000) = 1000/2000= 0.5 or 50%
d. What is the average amount sold?
= (3000+1000)/2= 2000 litres.
Q: According to the Insurance Institute, a family of four spends between Rs 400 and Rs
3,800 per year on its insurance. Assume that this money spent is uniformly distributed
What is the probability a family spends more than 3,000 per year?
P( X > 3000) = P( 3000< x < 3800) = (3800-3000)/( 3800 -400) = .25 or 25%
Q. The random variable X is continuous and uniform in [-1, 1] Answer the following

questions:
a. What is probability density function of x?
f(x)=1/(2 -1)= 1 for -1≤x≤<1
b. Consider the variable Y such that Y = 2X2 − X . Determine the sample space of Y;
when X= 1, Y = ( 2*12-1) = 1 and when X= -1 then Y=(2*-1*-1 –(-1))= 3. So Y
ranges from 1 to 3
X Y
-1 3
-0.5 1
0 0
0.5 0
1 1
c. compute the mean and variance of X
Mean (1-(-1))/2= 1
Variance = ( 1-(-1)2)/12= 1/6
Q: The closing price of Sport Goods Ltd is uniformly distributed between Rs15 and
Rs33 per share.
What is the probability that the stock price will be:
a. More than Rs 28? =.27778
b. Less than or equal to Rs20? =.27778
3. USEFUL LINKS
http://mathworld.wolfram.com
http://www.stat.purdue.edu/~zhanghao/STAT511/handout/Stt511%20Sec3.5.pdf
http://www.stattrek.com
4. EXERCISES
Here is list of old questions with the answers provided.
Q1. Ina normal distribution 31% of the observations are under 45 and 8% are above 64.
What is the mean and variance of X.
We are given P( X >64) =.08 and P( X <45) = .31

P( X >64) = P( z > z1) =0.08. from the normal tables z1= +1.405
z1= (64-mean)/ sd = 1.405 eq1
P( X <45) = P( z <z2) =0.31. from the normal tables z2= -.496
z2= (45-mean)/ sd = -0.496 eq2
Solve eq1 and 2 together to get mean = 49.95739 and sd = 9.99474 (sd is standard deviation)
Q2. The average time taken to finish a project by L&T is 11 months with sd deviation=2.4
months. If the firm has 19 projects in the pipeline how many can be expected to be
completed in less than 1 year?
P(X <12) = .6628
E(X)=12.59 so 12 projects.
Q3. The time for waiting for playing ground in a local tennis club ranges uniformly
between 23.5 to 40.5 minutes. If the probability that Harsh has to wait for more than 30
minutes is 60%, he will rather play badminton. Should game will he choose?
From uniform distribution, P( X>30) = (40.5-30)/(40.5-23.5) = 10.5/17= 0.617 or 61.7%.

Since we get a value >60% he must choose badminton.
Q4. JK tyres claims an average life of 45000 km for its tyres with standard deviation of
2000kms. Bharat buys 4 tyres for his old car. What is the chance that all 4 tyres will last at
least 46000 kms, assuming life of each tyre as independent of all other tyres in the car?
Probability of 1 tyre lasting more than 46000 kms is P(X≥46000) = P(z ≥.5)= .3085
P (all 4 tyres have life of at least 46000kms)=.30854 =0.0091.
Q5. Let X be normally distributed with mean=30 and variance=49. Find C such that P( (X
-30) <C) =.9545.
P ( (X -30) <C)=P( z< (C/7))=.9545
From the normal tables P (z < 1.69) =.9545
So C/7 = 1.69

C= 7*1.69= 11.83

DC-1
Sem-II
Chapter: Covariance and Correlation
Content Developers: Vaishali Kapoor & Rakhi

Arora
College / University: Rajdhani College


Table of Contents
2. Introduction
3. Covariance
a. Discrete Random Variable
b. Continuous Random Variable
c. Special cases
4. Correlation
5. Appendix
6. Summary
7. Exercises
8. Glossary
9. References

Learning outcomes
1. Define Covariance.
2. Calculate the covariance for the discrete and Continuous Random

Variables.
3. Consider the special cases of covariance.
4. Compute correlation.

Introduction
So far we have studied Joint Probability mass / Density function and

expected value of some function of random variables. It is also of interested, at
times, to know whether two random variables have some sort of relationship or
not. For example, someone may be interested in knowing whether marks
obtained by a student is positively affected by number of hours denoted to
studying by that student or is negatively affected by hours denoted to watching
T.V. If X is marks obtained and Y is number of hours daily spent on studying,
then one is interested in knowing whether X and Y are related. If Yes, positively
or negatively (the answer we expect is positive) and if it do affect, how strongly
are they related.
For answering the above question, we need to learn statistical techniques of

covariance and correlation. This chapter aids is understanding these and solving
through these techniques. First section of this chapter covers covariance for
discrete and continuous random variables and various theorems and its proofs
and corollaries are covered. Second section of this chapter focuses on measuring
strength of relationship between X and Y called correlation.
1. Covariance
When X and Y are two random variables and are not independent then
covariance between two random variables X and Y is
Cov( X , Y )  E[( X   X )(Y  Y )]
where X is mean of variable X and
Y is mean of variable Y.
(X  X) is deviation of X variable from its mean value
(Y  Y ) is deviation of Y variable from its mean value.
So covariance is expected value of deviations of X and Y from its respective mean

values.
If suppose, X and Y are positively related to each other, then this means that
when X attain large value then corresponding Y value also tend to be larger and
small values of X correspond to small values of Y. Then most of the probability
mass or density will be associated with ( X   X ) and (Y  Y ) , either both

positive or both negative so the product tends to be positive. Thus for strong
positive relationship Cov.( X , Y ) should be positive. For if there exists strong
negative relationship, signs of ( X   X ) and (Y  Y ) would be opposite,
yielding a negative Cov.( X , Y ) . If they are not related at all, then positive
product values would tend to be cancelled out with negative product values,
yielding Cov.( X , Y ) near zero.
Figure 1: Different Possible relationships between random

variables X and Y.
In the above figure, '+' and '' signs are marked  show areas where if X and Y
values are plotted shows 'positive' or 'negative' values respectively. In panel (a) X
and Y have positive relationship, in panel (b) X and Y have negative relationship
and in panel (c) X and Y have no relationship so covariance would be positive in
first case (a) negative in case (b) and around zero in case (c).
b) Covariance for Discrete Random Variables
If two discrete random variables X and Y are not independent then

covariance between X and Y is given by
Cov.( X , Y )  E[( X   X )(Y  Y )]
  ( x   X )( y  Y )  XY ( x, y )
all x all y
 
   xy  XY ( x, y )    X Y
 all x all y 

Example 1 : The joint PMF for X = automobile policy deductible and Y =
homeowner policy deductible amount is given in following table:
Y
X 0 100 200 PX(x)

100 0.20 0.10 0.20 0.5
250 0.05 0.15 0.30 0.5
PY(y) 0.25 0.25 0.5 1
 X  E ( X )   xPX ( x )
 0.5  100  0.5  250
 175
Y  E (Y )   yPY ( y )
 0.25  0  0.25  100  0.5  200
 125
Cov ( X , Y )    ( x  175)( y  125) PXY ( x, y )

all x all y
 (100  175)(0  125)(0.20)  (250  175)(0  125)(0.05)

+(100  175)(100  125)(0.10)  (250  175)(100  125)(0.15)
 (100  175)(200  125)(0.20)  (250  175)(200  125)(0.30)
 1875
Positive value of covariance suggests positive relationship between automobile
policy deductible amount and homeowner policy automobile amount.
(b) An alternative formula (a shortcut)
Cov( X , Y )  E[( X   X )(Y  Y )]
 E[ XY   X X  Y Y   X Y ]
 E ( XY )  Y E ( X )   X E (Y )   X Y
 E( XY )   X Y  Y  X   X Y

 E( XY )   X Y
c) Covariance for continuous random variables
For continuous random variables X and Y that are not independent,

covariance between X and Y is given by
Cov( X , Y )  E[( X   X )(Y  Y )]

 
  ( x   X )( y  Y ) f XY ( x, y ) dxdy
 
 
    xy f XY ( x, y )    X Y
   
 E( XY )   X Y
Example 2: If joint PDF for X and Y is given by
 24 xy 0  x  1, 0  y  1, x  y  1
f XY ( x, y )  
0 otherwise
12 x (1  x) 2 0  x 1
and marginal PDF is f X ( x)  
0 otherwise
with fY ( g ) obtained by replacing X by Y in f X ( x)

 
E ( XY )    xy f XY ( x, y ) dxdy
 
1 1 x
  xy 24 xy dydx
0 0
1 1 x
    24 x 2 y 2 dy  dx
0 0 
1 24 2 3 1 x
 x y 0 dx
0 3
1
  8 x 2 (1  x)3 dx
0
1
 8 x 2  x5  3 x 4  3 x 3
0

 x3 x 6 3x 5 3 x 4 1 
 8    
3 6 5 4 0
1
20 x3  10 x 6  36  45 2
 8 
60 0 15

X  E( X )   x f X ( x) dx

1
  x.12 x(1  x ) 2 dx
0
1
  (12 x 2  12 x 4  24 x3 ) dx
0
1
x3 x5 x4
 12  12  24
3 5 4 0
12
 4 6
5
12
 2 
5
2

5
2
Also, Y  since marginal PDF is same.
5
Cov ( X , Y )  E( X , Y )   X Y
2 2 2
  
15 5 5
10  12 2
 
75 75
This minus signs shows negative relationship between X and Y (since x  y  1,
more of X would mean less of Y).
d) Special Cases

(i) Till now we assumed that X and Y are not independent what if they are ?
Then we expect that there should not be any relationship or covariance
between X and Y be zero.
Cov( X , Y )  E ( X , Y )  E ( X ) E (Y )
 E( X ) E(Y )  E ( X ) E (Y ) [ X and Y are independent]
=0
So as covariance value is observed if X and Y are independent.
But above result must be used with a caution as reverse of it is not true. Consider
the sample space = {(2, 4), (1, 1), (0, 0), (1, 1), (2, 4)}, where each point is
equally likely. Random variable X is first component of sample and Y be the
second
1
P ( X  1, Y  1) 
5
1
P ( X  1) 
5
2
P (Y  1) 
5
So P( X  1, Y  1)  P( X  1) and P(Y  1)
So X and Y are dependent.
1
E ( XY )  [( 8)  ( 1)  0  1  8]  0
5
1
E ( X )  [(2)  (1)  0  1  2]  0
5
1
E (Y )  [4  1  10  1  4]  2
5
Cov( X , Y )  0  2  0  0
Therefore if Cov( X , Y ) is zero it can't be concluded that X and Y are independent
(it may or may not be)

(ii) Cov( X , X )
What if one wish to calculate covariance between X and X?
Cov ( X , X )  E ( X 2 )  E ( X ) E ( X )
 E ( X 2 )  [ E ( X )]2
 Var ( X )
Covariance (X, X) is the same thing as Variance of X which gives quantitative

measure of how much spread true is in the distribution or population of x values
c) Suppose X and Y are random variables and a and b are constants then
Var ( aX  bY )  a 2Var ( X )  b 2Var (Y )  2abCov ( X , Y )
Proof to it is as follows:
Var ( aX  bY )  E[( aX  bY ) 2 ]  ( a  X  bY ) 2
 E[ a 2 X 2  b 2Y 2  2abXY ]  [a 2  X2  b 2 Y2  2ab X Y ]
 a 2 E ( X 2 )  b 2 E (Y 2 )  2abE ( XY )  a 2  X2  b 2 Y2
 2ab X Y
 a 2 [ E ( X 2 )   X2 ]  b 2 [ E (Y 2 )  Y2 ]  2ab[ E ( XY )   X Y ]
 a 2V ( x)  b 2V (Y )  2 ab cov( X , Y )
If X and Y are independent then cov( X , Y )  0 .
Suppose there are n variables X1 , X 2 ,..., X n , then
 n  n
var   ai X i    ai2 var( X i )a j  2 a j a j cov( X i , X j )
 i 1  i 1 i j
If X1 ,..., X n are independent random variables and all ai ' s are equal to 1 then
var( X1  ...  X n )  V ( X1 )  V ( X 2 )  ...  V ( X n )
e) Scaling of Variables

In the two examples solved in this chapter, covariance in first example
2
was 1875 and is second it was 
75
Could we conclude that X and Y variables in first example have strong
(positive) relationship and a weaker (negative) relationship emerges in second
example? The Answer to it is No!
To see this, lets scale our random variables by a and lets check covariance
for two new variables aX and aY and compare it with covariance of X and Y.
cov(aX , aY )  E (aXaY )  E(aX ) E(aY )
 E ( a 2 XY )  aE ( X ) aE (Y )
 a 2 [ E ( XY )  E ( X ) E (Y )]
 a 2 cov( X , Y )
If cov( X , Y ) is 1800 and a  10 then cov(aX , aY ) is 1,80,000. Cov( X , Y ) is
positive and so is cov(aX , aY ) ; but cov(aX , aY ) > cov( X , Y ) for | a | > 1.
Thus it can be concluded that covariance accurately tells the direction of

relationship between X and Y - positive or negative but does not tell the strength
of relationship as it is affected by scaling of variables.
2. Correlation
As discussed in the last section that covariance suffers from the defect
that scaling of variable alters the value of covariance and then it does not serve
as a measure of strength of relationship. So a better measure is studied called
correlation coefficient.
The covariance of X and Y necessarily reflects the units of both random

variables, which can make it difficult to interpret. The measure of strength of
relationship should be dimensionless measure of dependency so that one xy
relationship can be compared to another. Dividing cov( X , Y ) by  X Y

accomplishes this task. Also, this scales the quotient to be a number between 1
and 1.
For two random variables, X and Y, correlation coefficient of X and Y is

given by

cov( X , Y )
 ( X ,Y )   cov( X * , Y * )
 X Y
where X *  ( X  X ) / X
Y *  (Y  Y ) /  Y
 X   X  Y  Y    X   X  Y  Y 
since cov( X * , Y * )  E      E  
  Y   Y     X   Y 
1 1
 E[( X   X )(Y  Y )]  E ( X   X ) E (Y  Y )
 XY  X Y
1
 cov( X , Y )  0
 X Y
n X
Since E(X  X )  E( X )   X  X  0
n
nY
E (Y  Y )  E ( X )   Y  Y  0
n
X* and Y* are called standardised variables. So correlation coefficient is
covariance between these standardized variables.
Some Useful Results for  ( x, y )

a) |  ( X ,Y ) |  1
Var ( X *  Y * )  0
Var ( X *  Y * )  Var ( X * )  2cov( X * , Y * )  Var (Y * )
 X  X  * *  Y  Y 
 var    2cov( X , Y )  var   
 X   Y 
1 1
 2
[var( X )  0]  2cov( X * , Y * )  2 [var(Y )  0]
X Y
[Since variance of constant is zero]

var( X ) var(Y )
  2 ( X ,Y ) 
var( X ) var(Y )
 1  2 ( X ,Y )  1
var( X *  Y * )  2[1   ( X , Y )]  0
 1   ( X ,Y )  0
 |  ( X ,Y ) |  1
b) |  ( X ,Y ) |  1 iff Y  aX  b for some constants a and b.
Suppose  ( X , Y )  1 then var( X *  Y * )  0 . A random variable with zero
variance is constant. So it readily follows that Y is a linear function of X i.e.
X * Y*   (say)
X*  Y*  (linear relationship)
Since it is iff only first part of statement is proved. For second part,
Let Y  aX  b
then E (Y )  aE ( X )  b and V (Y )  a 2V (Y )
 Y  a X
E ( XY )  E ( X ) E (Y )
 ( X ,Y ) 
 X Y
Putting Y  aX  b, E(Y )  aE ( X ) and  Y  a X
E ( X (aX  b))  E ( X ) a( E ( X )  b)
 ( X ,Y ) 
 X a. X
E (aX 2  bX )  a ( E ( X )) 2  bE ( X )

a X2
aE ( X 2 )  a ( E ( X )) 2  bE ( X )  bE ( X )

a X2

a ( X2 )
 ( X ,Y )  1
a. X2
Hence second part of statement is also proved. Similar results can be obtained for
 ( X ,Y ) = 1.
c) For some constants a and c either both negative or both positive,
 (aX  b, cY  d )   ( X , Y )
cov( aX  b, cY  d )
 (aX  b, cY  d ) 
V (aX  b )V (cY  d )
a.c cov( X , Y )

a 2V ( X )c 2V (Y )
a.c cov( X , Y )

| a.c |  X  Y
 (aX  b, cY  d )   ( X , Y )
This amounts to saying that scaling variables up or down in the same direction
does not affect the correlation coefficient.
d) Correlation coefficient measures only existence of linear relationship

between random variables X and Y.
 = 0 implies that cov( X , Y ) would be zero.
A zero value of covariance as discussed earlier, need not imply that X and
Y are independent and so is for ;  0 does not imply that there is no
relationship between X and Y.
For if X and Y are linearly related as Y  aX  b then |  ( X , Y ) |  1.

 ( X ,Y ) shows strongest possible linear relationship between X and Y
|  | 1 indicates that relationship is not completely linear, but there could
be a very strong non linear relationship. So,  0 implies that X and Y
are (linearly) uncorrelated but there could be high dependence between X

and Y given by some non-linear relationship.

Example 3: Let X and Y be discrete random variables with joint pmf
P ( x, y )   1
4 
for ( x, y )  ( 4,1),(4, 1) (2, 2), ( 2,  2)
1 1 1 1
 X  .( 4)  .(4)  (2)  ( 2)
4 4 4 4
=0
1 1 1 1
Y  (1)  (1)  (2)  (2)
4 4 4 4
=0
1 1 1 1
E ( XY )  (4)  ( 4)  (4)  (4)
4 4 4 4
=0
cov( X , Y )  E( XY )   X Y  0
so  XY  0
Plotting all pairs of X and Y on the graph shows that two variables are dependent
but  ( X ,Y )  0 represents absence of any linear relationship as evident in
following graph
Example 4: Continuing from Example 1, we know that

cov( X , Y )  1875, E ( X )  175 & E (Y )  125
 X2  E ( X 2 )  ( E ( X )) 2
 [0.5  (100) 2  0.5  (250) 2 ]  (175) 2
 36250  (175) 2
 5625
  X  5625  75
 Y2  E (Y 2 )  ( E (Y ))2
 [0  0.25(100) 2  0.5(200) 2 ](125) 2
 6875
 Y  6875  82.92
1875
 ( X ,Y )   0.301
75  82.92
Example 5:
Risk on Securities
In the last chapter, we calculated expected returns of the two securities but we
did not mention anything of risk. Risk is measured by standard deviation.
Standard deviation is an estimate of the likely divergence of an actual return from
expected return. So standard deviation is useful measure of risk as it weighs the
deviation with possible probability of that outcome.
Let’s revisit the old example of securities A & B. Following are the returns in
different states.
State 1 2 3 4 5 Total
Returns 10% 12 8 14 19
on
Security A,
RA
Returns 20% 25 33 27 22
on

Security
B,RB
Probability .10 .25 .35 .20 .10 1
RA-E(RA) -1.5 0.5 -3.5 3.5 8.5
RB-E(RB) -7.4 -2.4 5.6 -0.4 -5.4
(RA-E(RA)* 11.1 -1.2 -19.6 -1.4 -45.9
(RA-E(RB))
Where, E (RA) = 11.5 % & E(RB)=27.4(as calculated earlier)
V(RA) :Variance of returns on Security A = Pi(RA-E(RA))2
= (.10X(-1.5)2) + (.25X(0.5)2)+(.35X(-3.5)2)+(.20X(3.5)2)+(.10X(8.5)2)
= 0.225+0.0625+4.2875+2.45+7.225=14.25
Standard deviation of Returns on security A=14.25=3.775%
Similarly,
V (RB): Variance of Return on Security B=Pi(RB-E(RB))2
= (.10 x (-7.4)2) + .35 x (2.4)2 + .35 x (5.6)2+.20 x (-0.4)2+.10 x (-5.4)2
= 5.476+1.44+10.976+0.032+2.916=20.84
Standard deviation of returns on security B = 20.84=4.56%
Though the expected returns on security B are higher but also risk is higher on
security B measured by standard deviation.
Portfolio
Let us study the same portfolio of A & B with same weights of 0.25& 0.75
respectively. V(Rp): Variance of Portfolio=V (0.25RA+0.25RB)
(0.25) 2V (RA) +(0.75)2V (RB)+2(0.25)(0.75) Cov (R ARB)
Risk on portfolio is least compared to standard deviation of either Cov (R ARB) =

=Pi[(RB-E(RB))*( RA-E(RA))]
= (0.1x11.1)+(0.25x(-1.2))+.(35x(-19.6))+(.20x(-1.4))+(.10x(-5.4))
= 0.11+ (-0.3) + (-6.86)+(-0.28)+(-0.54)

= -7.87
V(RP) = [(0.25)2x14.25]+(0.75)2x20.84]+[2x(0.75)x(0.25)x(-7.87)]
=0.8906+11.7225+ (-2.95125)
= 9.66
Standard deviation of Returns on Portfolio = 3.10% of security A or B . It is due

to the reason that the two securities are negatively related shown by negative
value of covariance). Hence the portfolio is diversified.
Correlation between returns of A & B is given by =
( , ) .
= = -0.457
  . ∗ .
Exercises:
Q.1 Suppose that two dice are thrown. Let x be the number showing on the first
die and let y be the larger of the two numbers showing. Find cov (X,Y).
Q.2 Show that Cov(ax+b,cy+d) = ac cov (x,y) for any constants a,b,c,& d.
Q.3 Let x & y be random variables with
fXY(x,y) = {1, -y<x<y, 0<y<1
0, elsewhere
Show that cov(x,y) = 0 but that x & y are dependent.
Q.4 Suppose that fXY(x,y)= a2 ( )

, 0 x, 0 y. Find Var(X+Y).

References:
1. Jay L. Devore, Probability and Statistics for Engineers, Cengage Learning,

2010.
2. William G. Cochran, Sampling Techniques, John Wiley, 2007.
3. Richard J. Larsen and Morris. L. Marx, An Introduction to Mathematical
Statistics and its Applications, Prentice Hall, 2011.

DC-1
Sem-II
Chapter: Joint Probability Distribution

Content Developers: Vaishali Kapoor & Rakhi Arora

Table of Contents
2. Introduction
3. Joint Probability distribution
4. Joint Cumulative Distribution Function
5. Independence of Random Variables
6. Conditional and Marginal Probability Distribution
7. Appendix
8. Summary
9. Exercises
10. Glossary
11. References

Learning outcomes
1. Define Joint Probability Distribution.
2. Explain Joint Cumulative Distribution Function.
3. State the relationship between Joint Probability Distribution and Joint Cumulative
Distribution Function.
4. Differentiate between Probability Distribution function and Probability Density

Function.
5. Define independence of random variables.
6. Calculate the marginal and conditional Probability distribution function given Joint
Cumulative Distribution Function.

Introduction
In the last chapters, random variable was discussed as defined over some sample space S
with a measure of probability and it seems reasonable that many different random variables
are defined over the same sample space, S. To illustrate this, suppose there is a random
variable X defined as totals observed when a pair of dice is rolled. This is not only random
variable that can be studied on rolling of pair of dice. For example, one may be interested in
considering product or difference between the two numbers observed on two dices. In this
chapter, we shall study pair of random variables defined over a joint sample space at the
same time. For example, we wish to study simultaneously; that what are probabilities of
occurrence of event X : score on 1st dice and Y : greater of the two scores.
In the first section of this chapter, we would deal with joint PDFs for discrete and random
variables. Second section of this chapter focuses on joint cumulative function. Third
section covers the independence aspect of X & Y random variables. Finally conditional
probability distribution is discussed.
1. Joint Probability Distribution Function
As we already know, probability distribution function is defined differently depending on

whether random variable is discrete or continuous. Similarly, random variables X and Y
could be discrete or continuous. In this section, we will consider Discrete Joint PDFs and
continuous Joint PDFs.
(a) Discrete Joint PDFs
Suppose S is discrete sample space on which two random variables say X and Y are defined.
Probability that X takes value x and y takes value y in denoted by
PX ,Y ( x, y )  P ( X  x and Y  y )
is the probability of intersection of events X  x and Y  y .

Example 1: Consider the experiment of tossing 2 tetrahedral marked 1, 2, 3 and 4. Let
X : Score of 1st tetrahedron
and Y : The greater of two scores
Table 1
Sample space X Y
(1, 1) 1 1
(1, 2) 1 2
(1, 3) 1 3
(1, 4) 1 4
(2, 1) 2 2
(2, 2) 2 2
(2, 3) 2 3
(2, 4) 2 4
(3, 1) 3 3
(3, 2) 3 3
(3, 3) 3 3
(3, 4) 3 4
(4, 1) 4 4
(4, 2) 4 4
(4, 3) 4 4
(4, 4) 4 4
As can be seen, X takes on value either 1, 2, 3 or 4 and Y takes on values 1, 2, 3 or P(X =

1) = 4/16 i.e. out of total no. of possible outcomes of 16, four times we could get 1 on the
1st dice. Cases being (1, 1), (1, 2), (1, 3) and
(1, 4). Likewise, P (4 = 1) = 1/16 i.e. when (1, 1) is the outcome. But one could be

interested is knowing the probability of occurrence of both of these together i.e. P( X = 1
and Y = 1). This is illustrated in the following table:
Table 2
Y 1 2 3 4 PX(x)
X (Row Total)
1 1/16 1/16 1/16 1/16 4/16
2 0 2/16 1/16 1/16 4/16
3 0 0 3/16 1/16 4/16
4 0 0 0 4/16 4/16
P(y) 1/16 3/16 5/16 7/16 1

(Col. Total)
From the above table, PX ,Y (2, 2) = 2/16 i.e. when score on 1st tetrahedron is 2 and
highest of two scores is 2 is when (2, 1) or (2, 2) is the outcome. PX ,Y (4, 3) is zero since if
4 is the score on 1st tetrahedron; highest of two can't be 3 (it has to be four i.e. observed
on 1st tetrahedron).
Now let’s study, last row and last column of the above table. Suppose, given the joint PDF
we wish to again calculate probability distribution of either X or Y. X can take value 1 when
Y take value 1, 2, 3 or 4. So P(X = 1) is summed horizontally to 4/16. Likewise for P(Y = 1)
probabilities are added vertically. Last row labelled PY(y) is known as marginal probability
distribution function of Y.
For example:
X : P ( X  1)  P ( X  1, Y  1)  P ( X  1, Y  2)  P ( X  1, Y  3)  P( X  1, Y  4)
Suppose that PXY ( x, y ) is the joint PDF of the discrete random variables X and Y. Then,
marginal PDF for X is obtained by summing joint PDF over all values of Y i.e.

PX ( x)   PXY ( x, y )
all y
and similarly, PY ( y )   PXY ( x, y)

all x
If X and Y are discrete random variables, the function given by f ( x, y)  P( X  x, Y  y)

for each pair of values ( x, y ) within the range of X and Y is called the joint probability
distribution of X and Y.
Then f ( x, y)  0 for each pair values ( x, y ) . Also 

all x all y
f ( x, y )  1 i.e. summing over all
possible values of X & Y the probability of occurrence of any of the possible events is surely
1. This can be verified from table 2.
4
The minimum value of joint probability was zero and maximum value was . Summing all
16
the probabilities of X over all Y values gives PX ( x) and then summing all PX ( x) is 1 and
vice-versa.
Example 2 : Determine the value of k for which the function
f ( x, y )  kxy for x  1,2,3; y  1,2,3
serves as a joint probability distribution.
Putting in the values of X & Y we get,
f (1,1)  k ; f (2,1)  2k ; f (3,1)  3k
f (1,2)  2k ; f (2,2)  4k ; f (3,2)  6k
f (1,3)  3k ; f (2,3)  6k; f (3,3)  9k
Since 
all x all y
f ( x, y )  1
k  2k  3k  2k  4k  6k  3k  6k  9k  1
36k  1

1
k
36
1
For k , f ( x, y)  kxy serves as probability distribution function.
36
b) Continuous Probability Density Function
If X and Y are random variables, then joint PDF would be defined as a function when
integrated over the range of values of X and Y1 gives the probability that X and Y takes on
values within that range. Suppose there exists a function f XY ( x, y ) for any region R in xy-
plane; then
P[( X , Y )  R ]   f XY ( x, y)dxdy
R
the function f XY ( x, y ) is the joint PDF of X and Y.
Example 3. A study showed that daily number of hours, X, a teenager watches J.V. and the
daily member of hours, Y a teenager studies is approximated by joint PDF
f XY ( x, y )  xye ( x  y ) , x  0, y  0
Suppose a teenager is chosen at random. The probability that he spends at least twice as
much time watching TV as he does working on his studies.
The region R (in definition of continuous PDFs) corresponds to the xy-plane where X  2Y,
so that the task is to find out P( X  2Y ) which is given by:-
 x/2
P( X  2Y )    xye ( x  y ) dydx
0 0

  xe x
0
 0
x/2

ye y dy dx
  x  
  xe x 1    1 e x / 2  dx
0
 2  
1
In continuous case P(( X  x), (Y  y))  0 ; i.e. probability at any point is zero. X and Y need take values over
the range R.

  x 2 3 x / 2 
  xe x dx   e dx   xe3 x / 2 dx
0 0 2 0
16 4 7
 1  
54 9 27
f XY ( x, y ) serves as joint probability density function if and only if
(i) f XY ( x, y )  0 for  < x < 
 
(ii)   ( x, y)dydx  1
 
Example 4 : Suppose that joint probability density function for two continuous random
variables X and Y by f XY ( x, y ) = cxy, for 0 < y < x < 1
f XY ( x, y ) would qualify as joint probability density function if c  0 i.e. f XY is non-
negative and  cxydydx  1.

R
1
x 
 1  c   ( xy ) dy  dx
0 0 
x
1  y2 
 1  c x   dx
0  2 0
1
 x3 
 1  c    dx
0
 2 
1
x4
 1  c.
8 0
1
 1 c
8
 c  8.
For c  8 , f XY ( x, y )  cxy would serve as PDF for 0 < y < x < 1.

As in case of discrete random variables, marginal PDF, can be constructed from joint PDFs
by summing over other variables. Likewise

f X ( x)   f XY ( x, y)dy


and fY ( y )   f XY ( x, y)dx

where f X ( x) and fY ( y ) are marginal PDFs for X and Y and f XY ( x, y ) are joint PDFs for
continuous variables X and Y.
f X ( x) is obtained by integrating joint PDF over all values of Y and similarly, fY ( y ) is
obtained by integrating joint PDF over all values of X.
Example 5: Suppose that joint PDF for two continuous variables is given by
1
f XY ( x, y )  , 0  x  3, 0  y  2
6
Then marginal PDF of X is given by
2 1
f X ( x)   dy
0 6
2
y

60
1
 for 0  x  3.
3
So, X is a uniform random variable defined over the interval [0, 3].
3 1
fY ( y )   dx
0 6
3
x

60
1
 for 0  y  2
2
Y is also a uniform random variable defined over [0, 2].
2. Joint Cumulative / Distribution Function

For a single random variable, Cumulative probability means "less than and equal to x
probability". So for two random variables it is P( X  x and Y  y)
a) Discrete Random variables
If X & Y are discrete random variables, the function
F 2 ( x, y )  P ( X  x, Y  y )    f ( s, t )
s x t y
for   < x < ,  < y < 
where f ( s, t ) is joint PDF of X and Y
2. F(x, y) is used for joint cumulative distribution / density function in contrast to
f ( x, y) used for joint PDF.
(b) Continuous Random Variables
If X and Y are continuous random variables, then the function
y x
F ( x, y )    f (s, t )dsdt for   < x < ,  < y < 
 
2
Then it is clear that f XY ( x, y )  F ( x, y )
xy
Example 6: If joint cumulative distribution function of random variables X and Y is given by
(1  e  x )(1  e  y ) for x  0and y  0


F ( x, y )  
0 elsewhere

 2 F ( x, y )    
`   (1  e x )(1  e  y ) 
xy x  y 
  
  (1  e x  e  y  e( x  y ) ) 
x  y 
 y
 [e  e  ( x  y ) ]
x

= e ( x  y )
So joint PDF for X and Y is given by
e  ( x  y ) for x  0 and y  0
f XY ( x, y )  
0 elsewhere
Generalization (To multivariate case)
Suppose there are n-discrete variables defined over same sample space, S : X1 , X 2 ,...., X n
. Then,
f ( x1 , x2 ,..., xn )  P( X1  x1 , X 2  x2 ,..., X n  xn )
and F ( x1 , x2 ,..., xn )  P( X 1  x1 , X 2  x2 ,..., X n  xn )
marginal pdf of X1 is given by g ( x1 )  

all x2
 
all xn
f ( x1 , x2 ,..., xn )
for  < x1 < , ....   < x < .
If X's are instead continuous random variables then
xn x2 x1
F ( x1 , x2 ,..., xn )   ....  f (t1 ,....tn )dt1 , dt2 ,....dtn
  
 < x1, < ,  < x2 <  ...,  < xn < 
n
Also, f ( x1 , x2 ,..., xn )  F ( x1 , x2 , x3 ,...., xn )
x1x2 ....xn
 
marginal pdf of X1 is g ( x1 )   .... f ( x1 ,....xn )dx2 ...dxn
 

Example 7. If joint pdf for discrete random variables X, Y and Z is given by
( x  y)z
f ( x, y , z )  for x = 1, 2,
63
y = 1, 2, 3,
z = 1, 2
P( X  2, Y  Z  3)  f (2,1,1)  f (2,2,1)  f (2,1,2)
3 4 6
  
63 63 63
13

63
3. Independence of Random Variables
From the lessons on probability we know that X and Y are independent if
P( X  A and Y  B)  P( X  A)  P ( X  B ) .The continuous random variables X and Y
are independent if and only if there are functions g(x) and h(y) such that
f XY ( x, y )  g ( x)h( y )
1
If above equation holds, there is a constant K such that f X ( x)  kg ( x) and fY ( y )  h( y )
k

where k is set to be  h( y ) dy

Example 8: If joint PDF of X and Y is given by
f XY ( x, y)  12 xy(1  y) , 0  x  1, 0  y  1.
Then X and Y are independent since
f XY ( x, y )  12 x[ y(1  y )]
 g ( x).h( y)
where g ( x)  12 x and f ( y)  y(1  y)

f X ( x)  kg ( x)
1
fY ( y )  h( y )
k

where k  h( y ) dy

1
  y (1  y ) dy
0
1
y 2 y3
 
2 3 0
1

6
1
so f X ( x)   12 x  2 x for 0  x  1
6
and fY ( y )  6 y(1  y ) for 0  y  1
4. Conditional Probability Distribution
From chapters on probability we know that conditional probability, event A, given

event B is
P( A  B)
P( A | B) 
P( B)
Suppose these events are X  x and Y  y then
P( X  x & Y  y )
P( X  x / Y  y ) 
P (Y  y )
f XY ( x, y )
f XY ( x / y )  , h( y )  0
hY ( y )
where f XY ( x, y ) is joint PDF of X and Y and hy(y) is value of marginal distribution of Y at y
f XY ( x / y ) is called the conditional distribution of X given Y = y.

Note that f ( x, y) could be joint probability distribution or density function
depending upon whether X and Y are discrete or continuous random variables respectively.
Example 9: If joint PDF is
4 xy for 0  x  1, 0  y  1
f ( x, y )  
0 elsewhere

Marginal distribution of X; g ( x)   f ( x, y ) dy

1
  4 xydy  2 xy 2 |10
0
= 2x

Marginal distribution of Y; h( y)   f ( x, y ) dx

1
  4xydx
0
 2 x 2 y |10
 2y
f ( x, y ) 4 xy
f ( x / y)    2x for 0 < x < 1
h( y ) 2y
Generalisation II
If there are n-independent random variables X 1 , X 2 ,... X n and fi ( xi ) is value of
marginal distribution of Xi at xi then joint PDF is given by:
f ( x1 , x2 ,..., xn )  f1 ( x1 ) f 2 ( x2 ).... f n ( xn )

Joint conditional distribution of X 2 ,...., X n at ( x2 ...xn ) given X1  x1 is given by
f ( x1....xn )
z ( x2 , x3 ,....xn / x1 ) 
g ( x1 )
where g ( x1 ) is marginal PDF of X1 and f ( x1....xn ) is joint PDF of n random variables.
Appendix
I. Proof to the theorem that PX ( x)   PXY ( x, y ) if PXY ( x, y ) is joint PDF of

all y
discrete random variables X and Y.
Collection of Sets (Y = y) for all y forms a partition of S; that is, they are disjoint and
 (Y  y )  S . The set ( X  x)  ( X  x)  S
all y
 ( X  x)   [( X  x)  (Y  y )]  S , so
all y
 
PX ( x)  P( X  x)  P   [( X  x)  (Y  y)] 
 all y 
  P( X  x),(Y  y))
all y
  PXY ( x, y )
all y

II. Proof to the theorem that f X ( x)   f XY ( x, y ) dy where f XY ( x, y ) is joint

PDF of continuous random variables X and Y
 x
FX ( x)  P ( X  x )    f XY (t , y ) dtdy
 

x 
 FX ( x )    f XY ( x, y ) dydx
 
Differentiating the above equation provides us with marginal PDF of X.

f X ( x)   f XY ( x, y ) dy

III. Proof to the theorem that the continuous random variables are independent
if and only if there are function g ( x) and h( y) such that
f XY ( x, y)  g ( x)h( y )
Since it is if and only if statement; first, suppose X and Y are independent.
Then, FXY ( x, y)  P( X  x and Y  x)
 P( X  x) P(Y  y )
 FX ( x) FY ( y )
2 2
f XY ( x, y )  FX ( x, y)  FX ( x) FY ( y )
xy xy
 
 f XY ( x, y )  FX ( x ) FY ( y )
x y
 f X ( x) fY ( y )
Second part of the proof assumes f XY ( x, y )  g ( x)h( y ) and needs to prove that X and Y
are independent.

f X ( x)   f XY ( x, y ) dy

 
 g ( x ) h( y ) dy  g ( x )  h( y ) dy
 

let k h( y ) dy

so f X ( x)  kg ( x)


fY ( y )   f XY ( x, y ) dx


 g ( x ) h( y ) dx


 h( y )  g ( x ) dx

 
 h( y )
 
g ( x)dx 

h( y )dy


h( y)dy
h( y )  
 g ( x) h ( y ) dxdy
k   
1 1
 h( y )  1  h( y )
k k
Therefore,
P[( X , Y )  R ]  P ( x  A and y  B )    f XY ( x, y ) dxdy

A B
1
  kg ( x ). h( y ) dxdy
A B k
  f X ( x ) dx  fY ( y )dy
A B
 P( X  A) P( X  B)
So X and Y are independent.
IV. Transformations and Combinations of Random Variables.
(a) Suppose X is discrete random variable.
Let Y  aX  b
 y b
Then PY ( y )  PX  
 a 

Proof: PY ( y )  P(Y  y )  P(aX  b  y )
 y b   y b 
 P X    PX  
 a   a 
(b) Suppose X and Y are independent variables
Let W  X  Y . Then,
(i) If X & Y are discrete random variables
PW (w)   PX ( x) PY ( w  x)
all x
PW (w)  P(W  w)  P( X  Y  w)
 
 P   ( X  x, Y  w  x ) 
 all x 
  P( X  x, Y  w  x )
all x
  P ( X  x) P ( Y  w  x )
all x
  PX ( x) PY ( w  x)
all x
(ii) If X and Y are continuous random variable

f w ( w)   f X ( x ) fY ( w  x ) dx

Fw (w)  P( X  Y  w)
 w x
Fw ( w)    f X ( x) fY ( y ) dydx
 
 w x
 f X ( x)   fY ( y )dy  dx
   


 f X ( x ) fY ( w  x ) dx

d d 
Fw ( w)  Fw ( w)  f X ( x ) FY ( w  x ) dx
dw dw 

 f X ( x ) fY ( w  x ) dx


Exercises:
Q.1 If pxy (x,y) = cxy at the points (1,1), (2,1),(2,2) and (3,1) , and equals 0, elsewhere.
Find c.
Q.2 Suppose that random variables x and y vary in accordance with the joint pdf,
fxy(x,y)=c(x+y), 0<x<1, 0<y<1. Find c.
Q.3. An advisor looks over the schedules of his fifty students to see how many math and
science courses each has registered for in the coming semester. He summarizes his results
in a table. What is the probability that a student selected at random will have signed up for
more math courses than science courses?
Number of math courses,X
Number Of 0 1 2
science
0 11 6 4
courses,Y
1 9 10 3
2 5 0 2
Q.4 suppose that x & y have a bivariate uniform density over the unit square:
F xy (x,y) = [{c, o<x<1, o<y<1
& 0, elsewhere
i) find c
ii) find P (0<x<1/2, 0<y<1/4)
Q.5 Let X & Y have joint pdf
FXY (x,y) = 2 , 0<x<y,0<y
Find p (Y<3X)

Q.6 For each of the following joint pdf, find fx (x) and fY(y)
a) fxy (x,y) = 1/2, 0x2, 0y1
b) fxy (x,y) = 3/2 y2, 0x2, 0y1
c) fxy(x,y)= 4xy, 0x1, 0y1
Q7. For each of the fo0llowing joint pdfs, find fxy(u,v)
a) Fxy(x,y)= 3/2 y2, 0x2, 0y1

b) Fxy(x,y)=2/3(x+2y), 0x1, 0y1
Q8. Find the joint pdf associated with two random variables X & Y whose joint cdf is
Fxy(x,y) = (1- ) (1- ), x>0, y>0
Q9. The four random variables W, X, Y & Z have the multivariate pdf
Fwxyz(w,x,y,z)=16wxyz
For 0<w<1, 0<x<1,0<y<1, and 0<z<1. Find the marginnal pdf fWX(w,x) and use it to
compute P(0<W<1/2, ½<X<1)
Q10. Suppose fX (x)= x , x 0 and fY(y)= , y0 where X and Y are independent. Find
the pdf of X and Y.
References:

DC-1
Sem-II
Chapter: Mathematical Expectation for Joint

Probability Distribution
Content Developers: Vaishali Kapoor & Rakhi

Arora
Institute of Lifelong Learning ,Univeristy of Delhi 1

Table of Contents
2. Introduction
3. Conditional Expectation
4. Unconditional Expectation
5. Some laws to Expected values
6. Appendix
7. Summary
8. Exercises
9. Glossary
10. References

Learning outcomes:
1. Define Conditional Expectation
2. Differentiate between Conditional Expectation and Unconditional

Expectation
3. Generalize/ Extend the concept of expectations to the n- variable case.
4. Verify the independence of two random variables with the help of

expectations.
5. Calculate the expectation of linear combination of random variables.

Introduction
As we saw in last few chapters Expected value in one variable case is the
expected value of random variable X given by E ( X )   xPX ( x) if X is discrete

all x

and E(X )   xf X ( x ) if X is continuous random variable. E(X) is the value that

states if X is observed X would most likely be E(X). If X variable is transformed
into say g(X ) , then on similar lines E ( g ( X ))   g ( x).PX ( x) for discrete

all x

random variable X and E ( g ( X ))   g ( x ) f X ( x) for continuous random

variable X.
Similarly, for bivariate case E[ g ( X , Y )] where g ( X ,Y ) is a function of two
jointly distributed random variables.
This chapter is divided into three sections. First section covers conditional
expectation of a variable assuming a value of other variable for discrete and
continuous random variables. Second part of this chapter covers expected value
of some function of two or more random variables. Last sections focuses on some
laws of expectations, following their proofs.
1. Conditional Expectation
Conditional Expectation is denoted by E( X / Y  y) represents expected value of
X for a given value of Y = y; where X and Y are two random variables. Conditional
expectation of X given Y is equal to the mean of the conditional distribution of X
given Y.
a) Discrete random variables
For discrete random variables X and Y
E ( X / Y  y1 )  xf XY ( x / y1 )

xf XY ( x, y1 )

fY ( y1 )
where f XY ( x / y1 ) is conditional probability is conditional probability of X when Y
assumes a specific values y1 and f XY ( x, y1 ) is joint probability and fY ( y1 ) is
marginal probability.
Example 1: Consider the following joint PDF for random variables X and Y, where
X stands for no. of printers sold and y represents no. of computers sold
X
Y 1 2 3 fY(y)

1 0.03 0.06 0.06 0.15
2 0.02 0.04 0.04 0.10
3 0.09 0.18 0.18 0.45
4 0.06 0.12 0.12 0.30
fX(x) 0.20 0.40 0.40 1
E ( X / Y  2) denotes the expected value of printers sold if number of computers
sold is known to be 2.

all x
x. f XY ( X ,2)
E ( X / Y  2) 
fY (2)
(1  0.02  2  0.04  3  0.04)


0.10
0.22
  2.2
0.10
b) Continuous Random Variables.
If X and Y are continuous random variables


E ( X / Y  R)   x. f XY ( X / Y  R ) dx



xf XY ( x, y )dx


R
fY ( y ) dy
Example 2
If the joint probability density of X and Y is given by
( + 2 ) 0 < < 1, 0 < < 1

f(x,y) =
0 ℎ
find the conditional mean of X given Y=1/2
( + 1) 0 < < 1
f(x| ) =
0 ℎ
E(X| )=∫ ( + 1)
=∫ ( + )
2. Unconditional Expectation
If we wish to find expected value of some single valued function of X & Y, say
g ( X ,Y ) , then it is equivalent to saying that we are finding unconditional mean
of g ( X ,Y ) .
a) Discrete Random Variable
If X and Y are discrete random variables then,
( , ) =∑ ∑ ( , ) ( , )
where g ( X ,Y ) is a function of X and Y and  XY ( x, y ) is the joint PDF of
X and Y variables.

Example 3: Consider two discrete random variables X and Y and (X, Y) = xy
and joint PDF is given by following table:
Y
X 1 1 fX(x)

1 0 0.2 0.2
2 0.2 0.3 0.5
3 0.1 0.2 0.3
fY(y) 0.3 0.7 1
Then E ( XY )  xy. XY ( x, y)
 (1 (1)  0)  (2  (1)  0.2)  (3  (1)  0.1)

 (1 1 0.2)  (2  1 0.3)  (3  1 0.3)
 0  0.4  0.3  0.2  0.6  0.9
 0.10
XY could attain values from the following set S
S  {1, 2, 3,1, 2,3}
then E ( XY ) gives us the expected value that XY could take if X and Y are
randomly chosen.
2 2
Similarly, E ( X Y )  x y  XY ( xy )
 (1 (1)  0)  (4  (1)  0.2)  (9  (1)  0.1)

 (1 1 0.2)  (4  1 0.3)  (9 1 0.3)
 0  0.8  0.9  0.2  1.2  2.7
=2
X 2Y can take values from set A = {1, 4, 9, 1, 4,9}, E ( X 2Y ) = 2,gives the
2
expected value ( X Y ) could take if X and Y are randomly chosen.
b) Continuous Random Variables

If X and Y are continuous random variables then expected value of
 
g ( X , Y ) is given by E ( g ( X , Y ))    g ( X , Y ) f XY ( xy )dxdy;
 
where g ( X , Y ) is a continuous function and f XY ( x, y ) is joint PDF of X and Y.
Example 4
A nut company sells cans of mixed nuts containing almonds, cashews and
peanuts. Suppose net weight of each can is exactly 1 lb, but the weight of each
nut in the mix is random. Let X = the weight of almonds in a selected can and Y
= weight of cashews. Consider the joint PDF for XY as follows:
24 xy 0  x  1, 0  y  1, x  y  1
f ( x, y )  
0 otherwise
If 1 lb of almonds costs $ 1.00, 1 lb of cashews costs $ 1.5 and 1 lb of peanuts

costs $ 0.50, then total cost of contents of a can is
h( X , Y )  1( X )  1.5(Y )  0.5(1  X  Y )
= 0.5 + 0.5X + Y
where 1  X  Y is the weight of peanuts.
E (h( X , Y )) is the expected costs of a can selected randomly.
 
E (h( X , Y ))    h( x, y ) f ( x, y )dxdy
 
1 1 x
  [(0.5  0.5 x  y )24 xy ]dydx
0 0
1 1
    12 xy  12 x 2 y  24 xy 2 dy  dx
0 0 
1 x
 2 2
y3 
2 y 2 y
1
  12 x  12 x  24 x  dx
0
 2 2 3 0 
1
  (6 x (1  x)2  6 x 2 (1  x) 2  8 x(1  x)3 )dx
0
1
  [6 x  6 x 3  12 x 2  6 x 2  6 x 4  12 x3  8 x  8 x 4  24 x 2  24 x3 ]dx
0

1
   2 x 4  18 x3  30 x 2  14 x  dx
0
1
2 x5 18 x 4 30 x3 14 x 2
   
5 4 3 2 0
2 18
    10  7
5 4
4  45  30

10
11
  1.1
10
So the expected cost of randomly chosen can of nuts would be expectedly $1.1.
A special case of independent variables. If suppose X and Y are independent

random variables, then expected value of product of random variables can be
given by:
E ( XY )  E ( X ).E (Y )
Suppose X and Y are discrete random variables, then
E ( XY )    xy PXY ( x, y )
all x all y
  xy PX ( x) PY ( y )
all x all y
  xPx ( x). y PY ( y )
all x all y
 E ( X ).E (Y )
Generalising Expected Value concept to multivariate case, where there are n

random variables X 1 ,... X n and there is some function of these as g ( X 1 , X 2 ,... X n )
then
E ( g ( X 1 , X 2 ,... X n ))    ... 
all x1 all xn
g ( X 1 , X 2 ,... X n ) PX1 ... X n ( x1 ,..., xn )

for discrete X ' s
 
E ( g ( X 1 , X 2 ,... X n ))   .... g ( X 1 ,..., X n ) f X1 .... X n ( x1 ,...., xn )dx1...dxn
 
for continuous X's.
3. Some Laws to Expected Values
a) If X and Y are two random variables then,
E (aX  bY )  aE ( X )  bE (Y )
where a and b are constants.
Proof: If X and Y are discrete random variables then:
E (aX  bY )    (ax  by ) f XY ( x, y )
all x all y
  (a x) f XY ( x, y )    (b y ) f XY ( x, y )
all x all y all x all y
 a  x f XY ( x, y )  b   y f XY ( x, y )
all x all y all x all y
   
 a   x  f XY ( x, y )   b  y   f XY ( x, y ) 
all x  all y  all y  all x 
 a  x f X ( x )  b  yfY ( y )
all x all y
 aE ( X )  bE (Y )
b) If X and Y are continuous random variables then also result for a

and b constants holds as follows:

E (aX  bY )  aE ( X )  bE (Y )
 
Proof: E (aX  bY )    (ax  by) f XY ( x, y)dxdy
 
   
  axf XY ( x, y )dxdy    byf XY ( x, y) dxdy
   
 a x






f XY ( x, y )dy dx  b  y





f XY ( x, y )dx dy
 
 a  x f X ( x )dx  b  y fY ( y )dy
 
 aE ( X )  bE (Y )
(c) Generalisation to multivariate case for n random variables viz.

X 1 , X 2 ,..., X n and set of constants a1 , a2 ,..., an . Then
E (a1 X 1  a2 X 2  ...  an X n )  a1 E ( X 1 )  a2 E ( X 2 )  ....  an E ( X n )
d) If a  1 and b  1 then E ( X  Y )  E ( X )  E (Y ) and if
a  1 and b  1 then E ( X  Y )  E ( X )  E (Y ) . Proof of it is simple
and same as done is part (a).
(e) If X and Y are independent random variables then,
E ( XY )  E ( X ).E (Y )
The proof of it is already done in second section. Also, if X and Y are
independent random variables, then,
E[ g ( x).h( y)]  E ( g ( x).E ( h(Y ))
E ( g ( x ).h( y ))    g ( x )h( y ). f XY ( x, y )
all y all x
  g ( x)h( y ). f X ( x). fY ( y )
all y all x
  g ( x ). f X ( x)  h( y ). fY ( y )
all x all y
 E ( g ( x)).E ( gh( y))

Example5. If ten fair dice are rolled then calculate the expected value of the sum
of faces showing.
1
Let Xi be the number showing on the ith die for i = 1, 2, ... 10. PX i ( k )  for k
6
= 1, 2, 3, 4, 5, 6. Expected value of a number showing on the ith
6
1 1
dice  E ( X i )   k   21  3.5
k 1 6 6
Let X be value of sum of faces showing on ten dice. Then
X  X 1  X 2 ....  X 10
E( X ) is expected value of sum of faces showing
E ( X )  E ( X 1 )  E ( X 2 )....  E ( X 10 )
 10  3.5
= 35.
Example 6 : Five friends have purchased tickets to a certain concert. If the

tickets are for seats 1-5 in a particular row and the tickets are randomly
distributed among the five, what is the expected number of seats separating any
two of the five?
Let X and Y denote the seat numbers of the first and second individuals,
respectively. Possible (X,Y) pairs are {(1,2),(1,3)…….,(5,4)}, and the joint pmf of
(X,Y) is
( + 1) = 1, … 5; = 1, … .5; ≠
p(x,y) =
0 ℎ
the number of seats separating the two individuals is h(X,Y) = |X-Y|-1. The
following table gives h(x,y) for each possible (x,y) pair.
h(x,y) 1 2 3 4 5
1 - 0 1 2 3
2 0 - 0 1 2
y 3 1 0 - 0 1
4 2 1 0 - 0
5 3 2 1 0 -

Thus,
E(h(X,Y)) = ∑ ∑ ℎ( , ). ( , )
=∑ ∑ (| − | − 1) ∗ …..(x≠y)
=1
Example 7: Securities and Expected Returns
Most securities available for investment have uncertain outcomes and thus are
risky. Each investor has to hence decide in which asset would he invest. While
selecting securities, investor could look at expected returns on various securities
& have decide for or choose securities with higher expected returns.
Let us take following example to illustrate our point:-
Suppose there are two securities viz A & B . In different situations returns on
these securities would vary. The likelihood that any of these state prevail is also
given in the following table (i.e probability of each state).
State 1 2 3 4 5 Total
Returns on 10% 12 8 14 19
Security
A,RA
Returns on 20% 25 33 27 22
Security
B,RB
Probability, 10 25 .35 .20 .10 1
P
P * RA 1 3 2.8 2.8 1.9 11.5
P * RB 2 6.25 11.55 5.4 2.2 27.4
From the above table, we can read that in state 3 which is likely to occur & with
35% chance, returns on Security A & B would be 8 % 33% respectively.
E (RA) Expected return on Security A=
(.1x10) + (25x10) + (35x8) + (.20x14) + (.10x19)
=11.5%
E (RB) Expected return on Security B=
= ( .10x20) + (.25x25) + (.35x33) + (.20x27) + (.10x22)
=27.4%

Since the expected return on Security B is higher, investor would choose B over
A.
Creating Portfolio
If investor decides to invest the entire money not just in one security but some
mix of the two, then he is creating a portfolio.
Suppose if investor have chosen to invest 75% of his money is security B & 25 %
is security A then Expected returns on this portfolio,
( E(RP) = .25 E (RA) +75E(RB)
=.(25x11.5)+(75x27.4)
= 2.875+20.55
= 23.425%
Though the expected return on portfolio is less than expected return on security B
but in case of portfolio risk is reduced as he is now not putting all eggs in one
basket. (This would be dealt in next chapter.)
Exercises:
Q. 1 Suppose that the daily closing price of stock goes up an eighth of a point
with probability p and down with a probability q, where p>q. After n days how
much gain can we expect the stock to have achieved? Assume that the daily price
fluctuations are independent events.
Q.2 A disgruntled secretary is upset about having to stuff envelopes. Handed a

box of n letters and n envelopes, she vents her frustration by putting the letters
into the envelopes at random. How many people, on the average, will receive
their correct mail?

Q.3 Suppose that fXY(x,y) = 2/3 (x+2y) , 0x1, 0y1.Find E(X+Y).
Q4. Suppose that fXY(x,y)= a2 ( )

, 0 x, 0 y. find E(X+Y).
References:
1. Jay L. Devore, Probability and Statistics for Engineers, Cengage Learning,

2010.

Mathematical Methods for Economics: Vectors and Vector Operations
DC-1
Semester-II
Paper-IV: Mathematical methods for Economics-II
Lesson: Vectors And Vector Operations
1
Contents of the Present Chapter
1.0 Learning Outcomes

1.1 Introduction
1.2 Linear Equation Systems
1.3 Leontief Model
1.4 Vectors
1.4.1 Scalars
1.4.2 Types of Vectors
1.5 Operations of Vector
1.6 Geometric Interpretations of Vectors
1.7 The Scalar Product
1.8 Lengths and Distance of Vectors
1.9 Cauchy-Schwarz Inequality
2.0 Lines and Plane
2.1 Problem set and Answer
2.2 References
In the present chapter you will learn about the following aspects;
 Understand the concept of linear equation system and Leontief Model

 Understand the concept vectors and vector operations.
2
 Geometric interpretations of vectors.

 Rules of scalar product.
 Understand the concept lengths and distance of vectors.
 Understand the Cauchy-Schwarz inequality and its theorems.
 Understand the concept of lines and plane
1.1 Introduction
There are many systems in mathematics, which are employed to handle problems in
geometry, mechanics and other branches of applied mathematics. Vectors, Matrices and
Determinants are the important part of mathematical systems, which are related to linear algebra.
Basically linear algebra is branch of mathematics concerned with the study of vectors. Vector
spaces and linear maps between them are the main structure of linear algebra. Most of the
economic problems are based on multidimensional. Economists have used mathematical model
to solve these problem in terms of system of equations. If the system of equation are linear then
this area of mathematics are called linear algebra.
1.2 Linear Equation Systems
In general, equations systems are linear, if it has the form such that;
a11x1 + a12x2 + a13x3 + …… + a1nxn = b1
a21x1 + a22x2 + a23x3 + …… + a2nxn = b2
a31x1 + a32x2 + a33x3 + …… + a3nxn = b3
--------------------------------------------------------------------
am1x1 + a32x2 + a33x3 +…… + amnxn = bn

The above linear system is ‘m’ linear equation with ‘n’ unknown i.e. x1, x2 … xn, where
a11, a12, a13 … amn are the coefficient of equation system and b1, b2, b3 … bm are called right-hand
sides constraints. The above systems is said to be consistent if it has at least one solution
otherwise it is said to be inconsistent.
1.3 Leontief Model
It is also known input-output model and given by W.W. Leontief. In order to illustrate
why linear equations systems are important in economics, we briefly discuss this model.
3
Suppose an economy has three sectors, i.e. agriculture, industry and service. The total
output of a particular sector is consumed by all these sectors as input and final demand of the
sectors.
Input-output process can be explained by the given table;
Output Agriculture Industry Service Final Demand

Input X1 X2 X3 F
Agriculture X1 X11 X12 X13 F1
Industry X2 X21 X22 X23 F2
Service X3 X31 X32 X33 F3
Primary Input (Labour) L L1 L2 L3 -
By the above table, total output of agriculture, industry and service sectors can be
written as;
X1 = X11 + X12 + X13 + F1
X2 = X21 + X22 + X23 + F2
X3 = X31 + X32 + X33 + F3
and L = L1 + L 2 + L 3
In general, we can write as;
n n
Xi   Xij  Fi And L   Li
j1 L 1
Here Xi  Total output ith sector
Xij  Output of ith sector used as input in jth sector.
Fi  Final demand for ith sector.
4
The above identity states that all the output of particular sector could be utilized either as
an input in one of the producing sectors of the economy or as a final demand.
Now, if technological coefficient between the sectors defined as;
X ij
aij 
Xj
or Xij = aij . Xj
Then the above equation systems can be converted as;

X1 = a11X1 + a12X2 + a13X3 + F1
X2 = a21X1 + a22X2 + a23X3 + F2
X3 = a31X1 + a32X2 + a33X3 + F3
Now, write the equation systems in matrix form;
 X1   a11 a12 a13   X1   F1 

 X   a a 23   X 2    F2 
 2   21 a 22
 X3   a 31 a 32 a 33   X3   F3 
Or, X = AX + F
X [I-A] = F OR X = [I-A]-1 F
In general; X1 = a11X1 + a12X2 + a13X3 + ……. + a1nXn + b1
Or; (1-a11)X1 – a12X2 – a13X3 …… - a1nXn = b1
Similarly, - a21X1 + (1 – a22) X2 – a13X3 …… - a3nXn = b3
……………………………………………….
…………………………………………………
- aniX1 – an2X2 – an3X3 …… + (1-ann)Xn = bn
This is called Leontief systems of input-output. The numbers a11, a12, a13 … ann are
called technical (input) coefficient and b1, b2, b3 … bn are final demand.
5
0.5 0.2
Example: If the technical coefficient is given by A    and final demands of
 0.1 0.4
goods are 50 and 100. Write down the Leontief model.
Solution: Let X1 and X2 are goods.
Then; X1 = 0.5X1 + 0.2X2 + 50
X2 = 0.1X1 + 0.4X2 + 100
 X1  0.5 0.2  X1   50 
Or,  X    0.1 0.4  X   100
 2    2  
1
 0.5 0.2  X1   50   X1   0.5 0.2  50 
Or,  0.1 0.6   X   100 or  X    0.1 0.6  100
   2    2    
1.4 Vectors
‘Vectors facilitate analytic study of such physical objects as have direction in addition to
magnitude’. A vector space is a set whose elements can be added together and multiplied by
scalars or numbers.
Let F be a field of vector space and a1, a2 ……… an be the numbers of F. Then the
ordered set of number's is called vector of order n.
V = {a1, a2 a3 … an}
Where, a1, a2 ……… an are called the components of the vector V and these numbers are
also called scalars. It is also denoted by V .
1.4.1 Scalars
Quantities that have only magnitude and no direction are called scalars. For example –
time, population, temperature, power etc.
6
1.4.2 Types of Vectors
Zero Vectors: A vector whose initial and terminal points are coincided is called a zero
vector. The length of zero vectors is zero.
Equal Vector: Two vectors are said to be equal if they have the same length and
direction.
Unit Vector: A vector 'a' is called a unit vector if its magnitude is one. It is denoted by â .
Row Vector: It is represented by a row i.e. a = {a1, a2 a3 … an}
 a1 
a 
Column Vector: It is represented by a column, e.g. a   2 
 
 a 
n
Free Vector: A vector whose direction is known but the initial point and the line of
application are not known is called a free vector.
1.5 Operations on Vector
Rules for vector addition and multiplication by scalars is given below
If U, V and W are arbitrary n-vectors and ,  are arbitrary numbers, then,
(U + V) + W = U + (V + W)  Associative law. …(i)
U + V = V + U  Commutative law …(ii)
U+0=U …(iii)
U + (-U) = 0 …(iv)
( + )U = U + U …(v)
7
(U + V) = U + V …(vi)
(U) = .U …(vii)
I.U = U (Here I is identity Matrix) (viii)
Example: Given u = (2, -3, 5) and v = (-1, 9, -3). Compute a + v, u – v, 3u – 2 v and  2V
Solution:
u + v = (2, -3, 5) + (-1, 9, -3)
= {2 + (-1), (- 3) + 9, 5 + (-3)} = (1, 6, 2)
u – v = {2 – (-1), -3 -9, 5 – (-3)} = (3, -12, 8)
3u – 2v = 3 (2, -3, 5) -2 (-1, 9, -3)
= (6, -9, 15) – (-2, 18, -6)
= (8, -27, 21)
 2V   2  1,9, 3   2  9 2,3 2 

Example: If 2(x, y, z) + 5 (-1, 2, 3) = (3, 1, 3) then find x, y and z.
Solution: Given;
(2x, 2y, 2z) + (-5, 10, 15) = (3, 1, 3)
Then, 2x – 5 = 3 or x = 4
2y + 10 = 1 or y = -9/2
2z + 15 = 3 or z = -6
8
 2  4  1
Example: Prove that vector equation u    V      represents two equation in two
 3  6  0
unknown u, v, find the solution.
 2  4  1
Solution: Given; u    V     
 3  6  0
i.e. 2u + 4v = 1 …………………..…(1)
-3u + 6v = 0 ………………….…(2)
Solving (1) and (2) v = 1/8, u = ¼
Example: (i) When is the vector b is said to be a linear combination of vectors x, y and z?
(ii) Consider the vector a = (1, 2, 3) and b = (2, 3, 1) then Find k such that
w = (1, k, 4) is a linear combination of ‘a’ and ‘b’.
Solution: (i) Let c1, c2 and c3 are real numbers, then;
b = c1x + c2y + c3z
1   2  1 
(ii) Let  2  b  3   k 
     
3 1  4
Then; a + 2b = 1, 2a + 3b = k and 3a + b = 4,
Solving above equation then a = 7/5, b = -1/5 and k = 11/5
1.6 Geometric Interpretations of Vectors
The triangle law of vectors: The figure shows the triangle law of vectors. It is represented
by AB  BC  AC or a + b = (a + b)
9
Parallelogram law of vectors: The given figure shows the parallelogram law of vectors. It is
represented by OA  OC  OB .
Some other geometric interpretation of vectors operations in 2-space and 3-space is given below;
Example: Given u = (5, -1) and v = (-2, 4) compute u + v with the help by geometric vectors
starting of origin.
Solution:
10
1.7 The scalar product
The scalar product of any two n-vectors u = (u1, u2 … un) and v = (v1, v2 … vn) is defined
n
as; u.v = u1.v1 + u2.v2 + … + un.vn = u v
i 1
i i
If the commodity vector a = (a1, a2 … an) and price of commodity vector P = (P1, P2 …
Pn) then the scalar product of P and a is called total value of the entire commodity vector. It is
defined as; p1a1 + p2a2 + … + pnan = p.a
Example: If u = (1, 2, 3) and v = (-2, 3, 5) then compute u.v
Solution: u.v = 1.(-2) + 2.3 + 3.5
= -2 + 6 + 15 = 19
Rules for the scalar product
Let u, v & w are n-vectors and  is a scalar, then
u.v = v.u …(i)
u.(v + w) = u.v + u.w …(ii)
 (u.v) = (u).v = u.(v) …(iii)
11
u.u > 0  u  0 ------(iv)
1.8 Lengths and distance of vectors
If a = (a1, a2, a3 … an) be an n-vector then the length (norm) of a vector ‘a’ is given by:
a  a.a  a12  a 22  ...  a n2
If u = (u1, u2 … un) and v = (v1, v2 … vn) be on the vectors then the distance (Euclidean)
between the vectors is given by;
d uv   u1  v1 2   u 2  v2 2  ...   u n  vn 2
These aspects can be explained by the help of given below diagrams:
PQ  a 2  a1 PQ  b2  b1 By Pythagorean Theorem,
l2 = m2 + n2
= |a1 – a2|2 + |b1 – b2|2
Or PQ  l   a1  a 2 2   b1  b2 2
In general, PQ   x1  y1 2   x 2  y2 2  ...   x n  yn 2  xy .

12
In particular, if we take y to be zero, then the distance from the point x = (x 1, x2, … xn) to
the origin or the length of the vector x is given by x  x12  x 22  ...  x n2
1.9 Cauchy-Schwarz Inequality
If u and v are two vectors then Cauchy-Schwarz Inequality is given by:
u.v  u . v
Example: If u = (1, 2, -3) and v = (-3, 2, 5) be the two vectors then find lengths of vectors,
distance between vectors and check the Cauchy-Schwarz inequality (CSI).
Solution:
Lengths: u  1  4  9  14, v  9  4  25  38
Distance: d  u  v  16  0  64  80
CSI : u.v = (-3, 4, -15), then; u.v  9  16  225  250
Hence; 14  14. 38 , which is certainly true.
Orthogonality: If the angel between two vectors is 90 then the vector are said to be orthogonal.
It is denoted by a  b. So, we can say that two vectors in R2 or R3 are orthogonal if and only if
their scalar product is zero.
a  b  a.b = 0
In case of orthonormal vectors;
 Their dot product are zero, a.b  0 . Both vectors are unit vectors, a  b  1,
i.e. u.u = 1 and v.v = 1.
13
Theorem I: Prove that; rV  r . V for all r in R1 and V in Rn.
Proof: Let rV  r  v1 , v2 ...vn 
 rV1 , rV2 ,....rVn
  rV1 2   rV2 2  ...   rVn 2

 r 2 V12  V22  ...  Vn2 
 r V12  V22  ...  Vn2 (since r2  r )
= r . V proved.
Theorem II: Suppose u and v are two vectors in Rn and Q be the angel between them then
proved that u.v  u v cos Q
Proof: Let u = OP and v = OQ be the two vectors and t.v = OR, here v is the vector and t is a
scalar multiple.
By the triangular OPR;
14
tv t v
 cos Q   …(1)
u u
Now, applying Pythagorean Theorem in triangular OPR;
u  tv  u  tv
2 2 2
 t 2 v  u  t.v
2 2 2
u
 t 2 v  u.u  2.t.u.v  t 2 .v.v
2 2
u
 t 2 v  u  t 2 v  2.t.uv
2 2 2 2
u
u.v
t 2
               (2)
v
By equation (1) and (2)
u.v
cos Q  Proved.
u.v
Theorem III: If u and v are the two vectors in Rn then proved that u  v  u  v
u.v
Proof: We know that  cos Q  1
u.v
Or, u.v  u . v …(1)
u  v  u  2  u.v  v
2 2 2
Now
u  2  u.v  v  u  2 u . v  v
2 2 2 2
Or,
 u.u  u.v  v.u  v.v   u  v 

2
 u  v .  u  v   u  v
2
15
uv  u  v 
2 2
Or, uv  u  v proved.
 1  3
Example: Given, a  15 , b  5 are two-column vector.
 
   
 2  1
(i) Calculate lengths of vector a and b
(ii) Find k such that a vector c = a + kb is orthogonal to vector b.
Solution: (i)
a  1  225  4  230
b  9  25  1  35
(ii)
 1  3  1  3k 
c  15  k 5   15  5k 
 
     
 2  1  2  k 
Now, c is orthogonal to vector b then; c.b = 0
 1  3k  3 0
 15  5k .5  0
    
 2  k  1 0
3  9k  75  25k  2  k  0
k2
Linear Dependent Vectors
16
The vectors a, b and c are called linearly dependent vectors if scalars x, y and z exist,
such that:
x.a + y.b + z.c = 0
Linear dependent vectors in a plane are precisely collinear or parallel vectors.
Linear Independent Vectors
The vectors a, b and c in a plane are called linearly independent vectors if,
xa + yb + zc = 0
It implies, x = y = z = 0, linearly independent vectors are precisely non-collinear vectors.
Note:
 Two collinear vectors are linearly dependent.

 Two non-collinear vectors are linearly independent.
Example: If 5a + 3b = 2c and a.b = c then show that a and c have the same directions and a
and b have opposite direction. Are the vectors a, b and c linearly independent?
Solution: Given 5a + 3b = 2c and a – b = c
Solving, a = 5/8c, so ‘a’ and ‘c’ have same direction.
And, a = -5/3 b, so ‘a’ and ‘b’ have opposite direction.
The vectors a, b and c are not linear independent since there exists a linear combination
of the vectors a, b and c.
^ ^
Example: Let a  i  3 j and b  2iˆ  5 ˆj then find a unit vector parallel to vector a + b.
17
Solution: a  b  î  8jˆ
Now a  b  1  64  65
 Unit vector along a + b
1 ˆ 8 ˆ  1
 i j aˆ  
65 65  a
2.0 Lines and Plane
A line in Rn
The line L through the vectors a = (a1, a2 … an) and b = (b1, b2 … bn) is the set of all x =
(x1, x2 … xn) satisfying;
x = (1 – t) a + t.b, for some real number‘t’.
By using the coordinates of vectors is equivalent to;
x1 = (1 – t) a1 + tb1
x2 = (1 – t) a2 + tb2
……………………
xn = (1 – t) an + tbn
Now, let p = (p1, p2 … pn) is a point in Rn then straight line L passing through (p1, p2 …
pn) in the same direction of the vector a = (a1, a2 … an) is given by;
x = p + t.a
A Hyper plane in Rn:
18
A hyper plane through vector ‘a’ = (a1, a2 … an) that is orthogonal to ‘a’ vector p = (p1, p2
… pn)  0 is the set of all points x = (x1, x2 … xn) satisfying,
p.(x – a) = 0.
If we used coordinate representation of vectors then the equations is given by;
p1 (x1 – a1) + p2 (x2 – a2) + … + pn (xn – an) = 0
or, p1 x1 + p2 x2 + … + pnxn = A, where A = p1 a1 + p2 a2 + … + pn an.
Example: Find the equation for the plane in R3 though vector v = (2, 1, -1) with
P = (-1, 1, 3) as a normal.
Solutions: By the definition -1(x1 – 2) + 1 (x2 – 1) + 3(x3 – (-1)) = 0
Or -x1 + x2 + 3x3 = - 4.
Example: Given  = (1, 2, 1) and  = (-3, 0, -2), find real number x1 and x2 such that (x1 +
x2) = (5, 4, 4)
Solution: x1 = (x1, 2x1, x1) and x2 = (-3x2, 0, -2x2)
Then, 2x1 + x2 = (x1 – 3x2, 2x1, x1 – 2x2) = (5, 4, 4)
Now, x1 – 3x2 = 5
2x1 = 4
and x1 – 2x2 = 4
Solving, x1 = 2 and x2 = -1
Example: Find the equation of the line in R3 passing through the points (2, 4, -1) and (5, 0,
7). Where does the line intersect the xy plane? Using this equation to exactly describe the line
segment joining the two given points.
19
Solution: The equation of line, x = (1 – t) . a + t.b, for some real number’t’
Then, x1 = (1 – t) . 2 + t.5 = 2 + 3t
x2 = (1 – t).4 + t.0 = 4 – 4t
x3 = (1 – t).(-1) + t.7 = -1 + 8t
These line intersects the x1x2-plane, when x3 = 0.
So, -1 + 8t = 0 or t = 1/8, then we get;
x1 = 19/8, x2 = 7/2, x3 = 0
 line of intersects x1x2 plane = (19/8, 7/2, 0)
Example: Show that the vectors 2a – b + c, a – 3b – 5c and 3a – 4b – 4c are coplanar.
Solution: We know that three vectors, u, v and w are coplanar if u.x + v.y + wz = 0, where x
+ y + z = 0 and x, y, z are not all zeros.
Let 3a – 4b – 4c = x(2a – b + c) + y(a – 3b – 5c)
= (2x + y) a + (-x-3y) b + (x – 5y).c
So, 2x + y = 3 …(i)
x + 3y = 4 …(ii)
x – 5y = -4 …(iii)
Solving the equation (i) and (ii), we get x = 1 and y = 1.
These value of x and y satisfy the third equation, i.e. x – 5y = -4
Hence the given three vectors are coplanar.
20
Problem: Prove that the two vectors a and b are equal if and only if their components along
the x and y-axes are equal.
Solution: Let a  a1jˆ  a 2ˆj and b  bijˆ  b2ˆj be two vectors where a1, a2 and b1, b2 are the
components of a and b along the x and y-axes respectively.
Necessary Condition:
a  b  a1iˆ  a 2ˆj  b1î  b2ˆj
  a1  b1  î   b2  a 2  ˆj
It shows either (a1 – b1) î and  b2  a 2  ˆj are parallel or each is a zero vector. But they
are not parallel.
 a1  b1  î   b2  a 2  ˆj  0
 a1 – b1 = 0 and b2 – a2 = 0
Or a1 = b1 and b2 = a2
Sufficient Condition:
In this case a1 = b1 and a2 = b2, we have show that, a = b
 a1 = b1 and a2 = b2
 a1 – b1 = 0 and b2 – a2 = 0
  a1  b1  î  0   b2  a 2  ˆj
or a1i  a 2 j  b1î  b2ˆj
a=b
Problem Set
21
1. Having bought n commodities the price being p1, p2 … pn and quantities being Q1 Q2 …. Qn
express the total cost of purchase in vector notation.
2. The input coefficient matrix and final demand of three sector economy is given below:
 Ag. Ind. Ser.

 0.3 0.5 0.2  100
A   and F   40  million rupees.
 0.2 0 0.5   
   50 
 0.1 0.3 0.1 
Write down the Leontief model
3. If 3 (u, v, w) + 5 (-1, 2, 3) = (4, 1, 3), find u, v and w.
4. Solve the vector equation 4x – 7u = 2x + 8v – u for x in terms of u and v.
5. Are the following vectors independent?
1  0  4
u  2 , v  1 , w   5
   
     
 3  4  0
If not, find the pattern of dependence between them.
5 12
6. Show that the vectors: x    , y    t are orthogonal and find length of vector.
 4 15
7. Find the vector of unit length that is normal to the plane 3x + y – z = 10.
Prove that: u  v  u  v  2 u  2 v
2 2 2 2
8.
9. Can the vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1) spam the R3 space.
22
1   0 1 
10. Find the pattern of dependence of 1  1   0 can they span R3?
   
 0 1  1 
11. To find the point-normal equation of plane P which contains the points, p = (2, 1, 1), q = (1,
0, -3), r = (0, 1, 7).
 5  0
12. Given u    and v    , find 24 + 3v graphically.
1  3 
13. Prove that u = (-2, 1, -1) is a point in the plane –x + 2y + 3z = 1.
14. If the sum of two unit vectors is a unit vector, show that the magnitude of their difference is
3.
23
Answers of Problem set:
1. P Q
1 1  P2Q2  ...  Pn Qn 
1
 x1   0.7 0.4 0.2 100
 
X   I  A  .F   x 2    0.2 0.5  40 
1
2. 1
   
 x 3   0.1 0.3 0.9  30 
3. u = 3, v = -3, w = -4.
4. x = 3u + 4v
5. No
6. x  41, y  369 .
9. Yes
10. linear dependence
11. 3x – 7y + z = 0.
REFRENCES
 Allen, R.G,D, Mathematical Analysis for Economists, London: Macmillan and Co. Ltd
24
Mathematical Methods for Economics: Matrices and Matrix Operations
DC-1
Semester-II
Lesson: Matrices And Matrix Operations
1

1.1 Introduction
1.1.1 Matrices
1.1.2 Types of Matrix
1.2 Operations of Matrix
1.2.1 Addition and Multiplication by a Scalar
1.2.2 Rules of Matrix Addition and Multiplication by Scalars
1.3 Matrix Multiplication
1.3.1 Properties or Rules of Matrix Multiplication
1.4 Power of Matrices, Idempotent Matrix, Orthogonal Matrix
1.5 System of Equation in Matrix Form
1.6 Invertible Matrix
1.7 Problem Set and Answer
1.8 References
 Understand the concept of matrices
2
 You will be able to apply matrix operations.

 Understand the properties of matrix.
 Rules of matrix multiplication
 Understand the concept of power of matrix
 Understand the concept of invertible matrix
1.1 Introduction
The subject of matrices had its origin in various types of problems. Of these, solutions of
a given system of equations d liner transformations in geometry are extremely interesting. In
1857, the British mathematician Arthur Cayley formulated the general theory of matrices. He
developed the properties of matrices as pure algebraic structure, though matrices as arrays of
coefficients in homogeneous linear equation were recognized long before. A matrix is a very
useful tool to analysed of various problems in different subjects.
1.1.1 Matrices
A matrices is ordered set of numbers listed rectangular form; i.e.
 a11 a12 a13 
a a23 
 21 a22
 a31 a32 a33 
OR
A matrices is simply a rectangular table of numbers written in either ( ) or [ ] brackets.

These symbols are also called the notation of matrix. Matrix has many application in science,
engineering, computing and Economies. In economies, matrices is useful to study of stock
market tends, optimization of profit, minimization of loss, input - output analysis etc. Do not
confuse a matrix with determinants which use vertical bar, i.e. . Basically, matrix is a simple
pattern of numbers on the other hand determinant gives us a single number. The size of matrix is
written a ij , where, i =row and j = columns. a ij is the element of a matrix.
For examples
3
 a11 
a   a11 a12 a13 
 a11 a12  a
a   2  2 Matrix  21   4 1 Matrix  21 a22 a23  3 3matrix
 21 a22   a31 
   a31 a32 a33 
 a41 
1.1.2 Types of matrix
 Square Matrix: If a matrix has ‘n’ rows and ‘n’ columns then we say it is a square
matrix. For example;
 a11 a12 a13 

 a11 a12 
a or  a21 a23 
a22  2 2
a22
 21  a31 a32 a33  33
 Diagonal Matrix: It is a square matrix where all non-diagonal element is zero such that
 a11 0 0
A   0 a12 0 
 0 0 a13  33
 Row Matrix: A matrix with one row is called a row matrix.
A   a11 , a12 , a13 13
 Column Matrix: A matrix with one column is called a column matrix.
 a11 
A   a21 
 a31  31
 Zero Matrix: If the all elements of an matrix is zero then it is called zero matrix.
0 0 0
 
A  0 0 0
0 0 0
4
 Opposite Matrix: If the all elements of an matrix multiply by negative sign then we get
opposite matrix
aij =  aij
 Transpose Matrix: If we convert row to column and column to row of an matrix then we
gent transpose of an matrix. For example;
 a11 a12 a13 

A   a21 a22 a23 
 a31 a32 a33  33
Then;
 a11 a21 a31 

A   a12
T
a22 a32 
 a13 a23 a33  33
It is also true for m  n matrix
We can also write;
 
AT  A'  a ' ij , where a’ij =  aij
Example: Given;
  1 0 2
A , find A
T
 2 3 1 
Solution:
  1 2
A  A   0 3
T '
 2 1 3 x 2
5
Rules of Transposition
 A 
1 1
 A..................(i)
  A  B  A1  B1..................(ii )
1
 (A1 )  A1 and  B    .............(iii) (is constent )

1
  AB   B1 A1........................(iv)
1
 Symmetric Matrix: A square matrix is said to be symmetric matrix, it is is equal to its

transpose matrix.
i.e. aij  a ji  i & j
or A  A '  AT A  a n x n
ij
 Skew - symmetric Matrix: It is defined as;

aij  a ji
orA   A '   AT { A  (aij ) nn & A  (a ji ) nn }
 Identity Matrix: An identity matrix (I) is an diagonal matrix with all diagonal elements
is equal to one. It is also known as unit matrix
1 0 0 0
1 0 0  0 0 0 
I 3  0 1 0 
1
In  
.. .. .. .. 
0 0 1  33  
0 0 0 ...1 nn
Or A In = InA = A
Example: Construct the 3  3 matrix A = (aij)3  3 with aij = 2i  j
Solution: Total element of matrix = 3  3 = 9
Given; aij = 2i  j
6
 a11  2 x1  1  1, a12  2  2  0, a13  2  3  1

a21  4  1  3, a22  4  2  2, a23  4  3  1
a32  6  2  4, a33  6  3  2, a23  4  3  3
Then Matrix;
 a11 a12 a13 

A   a21 a22 a23 
 a31 a32 a33  33
Putting the values;
1 0 1
A  3 2 1 
5 4 3  33
2.1 Operations of Matrix
Equality Matrix: Suppose, A = (aij)m  n and B = (bij) m  n be the two mn matrices.
Then A and B is said to be equal matrices if A = B
aij  bij i  1,2,      m & j  1,2,        n
Thus, if both matrices have some dimension then they called equal. Otherwise they called
unequal matrix such that A  B.
Example: Given;
 3 y  1  z  2 2 
  
2 x   4 3 
Solve for x, y, and z.
Solution: By equation of matrices
7
3 y  1  z  2 2
2 
 x   4 3
 Z  2  3 or Z  5, y  1  2, y  3 and x  3
1.2.1 Addition and Multiplication by a Scalar
Let A = aij)m x n and B = (bij) be the two martices thent the sum of A and B matrices is
defined as;
A + B = (aij)m x n + (bij)m x n
A + B = (aij + bij)m x n
If  is a real mumber, then
A =  (aij)m x n = (aij)m x n
and B =  (aij)m x n = (bij)m x n
 A + B = (aij + bij) m x n
Example : Given,
1 2 3  1 0 2
A   , B   
 4 2 - 3 0 2 1
1
Compute A + B, and 2 A  B
2
Solution:
8
1 2 3  1 0 2
AB    
 4 2 -3   0 2 1
 11 2  0 3  2 
  
 4  0 2  0 -3  1
2 2 5 
  
 4 4 -2 
And
1 2 3  1 1 0 2
2A  ½B  2    
4 2 -3  2  0 2 1 
2 4 6 ½ 0 1 
     
8 4 -6   0 1 ½
 2  ½ 4  0 6  1   2½ 4 7 
  
 8  0 4  1 -6  ½   8 5 5½ 
1.2.2 Rules of Matrix Addition and Multiplication by Scalars
If A, B and C are m  n matrix and  and  are scalar then;
o (A + B) + C = A + (B + C)
o A+B=B+A
o A+0=A
o A + (A) = 0
o ( + ) A = A + A
o  (A + B) = A + B
1.3 Matrix Multiplication:
Let us consider two matrices A and B, such that
 a11 a12 a13   b11 b12 b13 

A   a21 a22 a23  and B  b21 b22 b23 
 a31 a32 a33  3 3 b31 b32 b33  3 3
9
Then the product of A and B is denoted by AB and it is given by;
 a11b11  a12b21  a13b31 a11b12  a12b22  a13b32 a11b13  a12b23  a13b33 

AB   a21b11  a22b21  a23b31 a21b12  a22b22  a23b32 a21b13  a22b23  a23b33 
 a31b11  a32b21  a33b31 a31b12  a32b22  a33b32 a31b13  a32b23  a33b33  3 3
Example: Given;
2 3
4
A  2 0 and B   
1 2 2
Compute AB and BA
Solution:
2 x 4  3x 2  14
AB  2 x 4  0 x 2   8 
1x 4  2 x 2  8 
BA  4x2  2x3 4x2  2x0 4x1  2x2  14 8 8
Problem : Given that A = (aij)mn and B (bij)mp then compute the product of both matrix
i.e. C = AB
Solution: The product of two matrix is given by;
C  AB
or (cij ) m x p  (aj ) m x n (bij ) n x p
or Cij  aij bij  ai 2b2 j  .........  ain bnj
Product of i th row of A and jth column of B
10
In general;
 b11 b12    b1 j    bip 

 C11 C12    C1 j   
Cip   
   a11 a12    a1k   ain  . . . . 
. . . .  .
. . . .  . . . . 
. . .    
  . . . .  . . . . 
. . . .    b
C bk 2    bkj    bkip 
Ci 2    Cij    Cip  . . . . 
 k1 
 i1  a ai 2    aik    ain  . . . . 
. . . .    i1   
  . . . .  . . . . 
. . . .  . . . .  . . . . 
 Cm 2 Cm 2    Cmj    Cmp     
  . . . .  bn1 bn 2    bnj    bnp 
  a
   m1 am 2    amk    Cmn   
 
    m  n  
   
mp
 m  p
1.3.1 Properties or Rules of Matrix Multiplication
 Matrix Multiplication is not commutative:

If A and B are two matrix, then
AB  BA
1 0  0 1
For example, let A  and B 
0  1 1 0
and, if AB and BA are both defined, then,
1 0   0 1 
AB    
0 1 1 0 
 1 0  0 1 11  0  0 
  
0  0  (1) 1 0 1  (1)  0 
 0 1
  
 1 0 
And
0 1 1 0  0 x 1  1 x 0 0 x 0  1 x (1)
BA       
1 0 0  1 1 x 1  0 x 0 1 x 0  0 x (1)
11
0  1
BA   
1 0 
So, we can say; AB  BA
Hence commutative law does not held
Note: Sometime it may hold but it does not always hold
 Matrix Multiplication is Associative;

If A, B and C are three mn, np and pq matrices respectively
Then, (AB)C= A(BC)
 Matrix Multiplication is distributive with respect to addition of matrices :

Suppose A, B, C are three mn, np and pq matrices respectively, then
A(B + C) = AB + AC
 Matrix Multiplication by a unit Matrix:

If a is square matrix of order nn and I is the unit matrix of the same order, then
AI = A = IA {Also, I I = I}
 If the product of two matrices is a zero matrix, then it is possible that none of them is a
zero matrix, i.e.
AB = 0, then A  0 and B  0
Where RHS ‘0’ is zero matrix
Let
 1 1 2 2
A   and B  
 1 1   2 2
12
Then
1 2  (1)  2 1  2  (1)  2  0 0 

AB   
 1 2  1 2 1 2  1  2  0 0
Hence, the product is zero matrix.
 Cancellation law does not hold in matrix multiplication:

Suppose A, B, C are three matrices such that AB and AC are defined, then AB = AC does
not imply B = C
3 4 2 6
A  and B   
2 7 1 5
Then find AB, A2, B2, and (A + B)2. Is (A + B)2 = A2 + B2 + 2AB ?
Solution: We have given
3 4 2 6
A  and B   
2 7 1 5
3  2 4  6 5 10
 A B     
 2  1 7  5 3 12
3 4 2 6 3  2  4  1 3  6  4  5 
and AB        2 x2  7 x1 2  6  7  5
 2 7  1 5   
10 38 
  
11 47 
Now
 3 4  3 4  9  8 12  28
A2        
2 7 2 7 6  14 8  49 
13
17 40 
  
 20 57 
2 6 2 6  4  6 12  30 
B2        
1 5  1 5   2  5 6  25 
10 42 
  
 7 31
17 40 10 42 10 38 

 A2  B 2  2 AB    2 11 47 
 20 57   7 31  
17 40 10 42  20 76

    
 20 57   7 31  22 94
17  10  20 40  42  76
  
 20  7  22 57  31  94 
 47 158
                (1)
 49 182
Also,
5 10 5 10 5  5  10  3 5 10  10 12

 A  B   
2
   
3 12 3 12  3 5  12  3 310  12 12 
55 170
  
 51 174 ----------------------------------------------(ii)
from (i) and (ii)
A  B2  A2  B2  2AB
1.4 Power of Matrices
14
Suppose A is a square matrix, then the power matrix is defined as; A2  AA
A3  AAA and so on----------
In general An  AAA......... A Here, A is repeated n times
1 1
Example: Let A  then prove that;
0 1
1 k 
Ak   
0 1
Solution: Given
1 1
A 
0 1
1 1  1 1 1
A. A     
0 1  0 1
1 1  2 

0 1 
1 1   k  1 1 1
Then AK  AK  1. A    0 1
0 1   
1 k 
A  Pr oved
0 1 
Idempotent Matrix: Let A be an square matrix then the product A by itself is called Idempotent
matrix. It is defined as;
AA = A , AAA = A3 = A
In General An = A
15
Orthogonal Matrix: Let A is the nn square matrix then A is said to be orthogonal matrix if,
or (A)nn (A') nn = I nn
 a b 
Example: Given; A   
 b a  2 2
Prove that A is orthogonal iff a2 + b2 = 1
Solution: Given,
 a b 
A 
 b a  2 2
 a b
Than A   
 b a  2 2
Now by property of orthogonal matrix,
 a b   a b 
AA     
b a   b a 
 a 2  b 2 ab  ba 
 2
ba  ab b  a 
2
 a 2  b 2 ab  ba 
 2
ba  ab a  b 
2
a 2  b2 
0
 
 0 a b 
2 2
Given, a2 + b2 = 1 then
1 0 
AA     I2 Hence Proved
0 1 
16
1.5 Systems of Equations in Matrix form
It is defined as;
2x + 3y = 4 --------- (1)
6x - y = 2 --------- (1)
Now, these equation can be written as;
2 3   x 4
A  , X    and b    then;
6 1  y 2
2 3   x 2 x  3 y 
AX   ,  y   6 x  y 
6 1    
It is equivalent to the matrix equation
AX = b
3 4 
Example: If A    then prove by mathematical induction
1 1
1  2n 4n 
An  
 n 1  2n 
3 4 
Solution: Let A   
1 1
3 4 3 4 5 8

A2  A. A       1 3
1 1 1 1  
1  2  2 4  2 

 1 2 1  2  2
17
1  2n 4n 
A3  A. A  
1  2n 
proved
 n
Example: (i) A matrix P is orthogonal if P'P = 1. Prove that if P is an nn matrix whose columns
are all of length 1 and mutually orthogonal then P is orthogonal .
(ii) Find out if A is an orthogonal matrix.
1 1 1
A  1 3 4 
7 5 2 
Solution: (i) Let P   P1 P2     Pn  and Pi ' Pi  1
 P11 P11 P2  P11 Pn 

 1  1 0    0
P11 P2  P21 Pn  0 1
 P2 P1
    0 
 . . . 
PP      .  .  I
 . . .   
 .  .
 . . . 
  0 0    1 
 Pn P1
1
Pn P2     Pn Pn 
 P is orthogonal
(ii) No, A is not orthogonal matrix because columns of A are not of length 1
then;
AA  I 3
Example: (i) Let D be the 33 diagonal matrix with entries d1, d2 and along d3 along the
diagonal and zero's elsewhere. Let A = (aij) be an arbitrary 33 matrix. Compute AD and DA.
Show that AD multiplies the ith column of A be entry di while DA multiples the ith row of A by
entry di.
18
(ii) If D is the 33 diagonal matrix with entries

d1 = 2, d2 and d3 = 4, find the A such that AD = DA
 d1 0 0  a11 a12 a13 

Solution: (i) Given D   0 d 2 0  , A  a
 21 a22 a23 
 0 0 d3   a31 a32 a33 
 d1a11 d 2 a12 d3a13 

AD   d1a21 d 2 a22 d3a23  ith column multiply by di ----- (i)
 d1a31 d 2 a32 d3a33  3  3
 d1a11 d 2 a12 d3a13 

DA   d1a21 d 2 a22 d3a23  ith row multiply by di ----- (2)
 d1a31 d 2 a32 d3a33  3  3
By (1) & (2)
AD = DA
 a11 0 0
(ii) Given; A   0 a22 0 

 0 0 a33  3  3
2 0 0
D   0 3 0 
 0 0 4 
  2  1
Example: For what value of β, D    is symmetric?
2   1
Solution: Given,
19
  2  1
A  
2   1
A is symmetric matrix then,
A = AT = A1
  2  1   2 
     2 
2   1    1   1
By equating the matrices,
2  1  2
or  2  3
or    3, the matrix is symmetric
1.6 Invertible matrix
A square matrix A is called invertible matrix, if
AB = BA = In
or AB = I
and BA = I {Hence B is inverse of matrix A}
Problem Set
20
(1) Given,
1  2  x 1
A   , B 
 1
, find x and y
 2 1 y
1 q  1 nq 
(2) Let B    , the prove B n   
0 1 0 1 
3 2
If A   , find A2  5 A  7 I
1
(3)
5
1 3 0 0 1 0 
(4) If A  1 1 0 , B  1 0 0 then prove that AB  BA
 
 4 1 0 0 5 1 
2 3  3 1 
(5) Let B    , B   , prove that  AB   BA
 4 5  2 5
(6) Given,
 2 2 4 
A   1 3 4  , then prove A is idempatent matrix
 1 2 3
1
(7) For   the following given matrix is orthogonal
2
 0  
A    0   
 0 1 0 
3 1 2
(8) Show that A  1 2 0  , is symmetic matrix
 2 0 1 
21
(9) Give an example of three matrices X, Y and Z to show that if

XY = ZX, then it does not necessary that y = z
(10) If  is a scalar, A and B are matrices of order 3  4, then show that (A + B) =  A +

Answers of the Problem Set
(1) x=1&y=2
 11 6 
(3)  15 23
 
 2 6  4 2
1 1 1 
(9) X   , y   1 10  , z   4 6 
 
 2 2 2  4 2   3 2 
REFRENCES
22
23
Determinants and Matrix Inversion
DC-1
Semester-II
Lesson: Determinants And Matrix Inversion

 Learning Outcomes of the Present Chapter

 Introduction
 Determinant: Definition and Its Order
 Sarrus’ Rule
 Basic Rules for Determinant
 Multiplication of Determinants
 Adjoint or Adjugate and Cofactors of a Determinant
 Inverse Or Reciprocal Determinant
 Symmetric Determinants
 Skew and Skew-Symmetric Determinants
 The Inverse of a Matrix
 Finding Inverse by Elementary Row Operations
 Cramer’s Rule and Simultaneous Equations
 Homogeneous System of Equations
 Solved Examples
 Problem Set
 References

1.0 Learning Outcomes of the Present Chapter
 Understand the concept of determinants and its order

 You will be able to apply determinant to solve the mathematical equations.
 Get to know the various properties of determinants and its types.
 Explain the Sarrus’ method of inverse
 Understand the concept of inverse and its application
 Know the steps and tools to calculate and solve determinants.
 Understand the different types of determinants and its various concepts.
 Solve the simultaneous equations using determinants applying Cramer’s Rule.
 Describe the Trivial or nontrivial solution of Homogeneous system of equations
1.1 Introduction
The present chapter is developed to understand the concept of determinants and its
application to find out matrix inversion. Basically, it is a part of linear algebra which eases
the difficulty level of the simultaneous equations in algebra by providing means for their
presentation and solution.
1.2 Determinant: Definition and Its Order
Every square matrix A n×n is associated with a unique number called the
determinants of the matrix. If A = (aij) be an n×n matrix, then the determinant of A is
denoted by |A| or det(A) or
If A = (a11) be an 1×1 matrix, then |A| = a11 i.e., the determinant is equal to the element
itself.
a a12 
If A   11  be a 2×2 matrix then,
a 21 a 22 
a11 a12
|A|  = a11×a22-a12×a21
a 21 a 22

It is called determinant of order two or second order determinant. The value of

determinant of order two is equal to a11×a22 - a12×a21. Here, the elements of determinants
are a11, a22, a12 and a21.
Thus we may represent the determinant in terms of rows and columns as:
LEADING TERM: The diagonal elements in the determinant i.e. b11 and c22 are the leading
term and it always has a positive sign.
Note: A determinant of the second order has two diagonal elements having positive signs
and 2! = 2 terms in its expansion out of which one is positive and other is negative.
The IIIrd order determinant or determinant of order 3 can be defined as;
a11 a12 a13

a a 23 a a 23 a a 22
A  a 21 a 22 a 23  a11 22  a12 21  a13 21
a32 a33 a31 a33 a31 a32
a31 a32 a33
 a11 (a 22 a33  a 23a32 )  a12 (a 21a33  a 23a31 )  a13 (a 21a32  a 22 a31 )
So, we have expressed the determinant of order 3 in terms of determinants of order

2. We can similarly express the determinants of higher orders.
Note: We can expand a determinant by any row or column and it will generate the same
value of the determinant every time.
If we consider the following equation with ‘k’ number of equations and ‘m’ number
of unknowns:
b11x1 + b12x2 + … + b1mxm = c1
b21x1 + b22x2 + … + b2mxm = c2
. . . . .
. . . . .
. . . . .
bk1x1 + bk2x2 + … + bkmxm = ck
We can express this equation in a compact way and solve it by using matrices. Let
B= ,X= ,C=

In simple terms, it can be written as BX = C and thus be solved. This square matrix
we know is non-singular and in this chapter, we use determinants to determine whether a
given square matrix is non-singular/invertible or not. For a matrix to be non-singular, its
determinant value should not be equal to zero.
Example: Compute the value and cofactors of the given below determinant.
|A| =
Solution: Expanding the determinant from 1st row we get,
|A| = 1 -2 +3
= 1(6-3) -2(12-21) +3(4-14)
= 3 – (-18) + (-30) = -9
We can expand the determinant by any row or column, if we expand it by 1st column,
|A| = 1 -4 +7
= 1(6-3) – 4(6-3) + 7(6-6)
= (3) – (12) + (0) = -9
Cofactors of determinant;
Cofactor of element in 1st row and 1st column: (-1)1+1 = =3
Cofactor of element in 1st row and 2nd column: (-1)1+2 =- =9
Cofactor of element in 1st row and 3rd column: (-1)1+3 = = -10
And so on for other elements.
1.3 Sarrus’ Rule
Sarrus’ rule is the alternative way to compute determinants of order 3. This method
is very convenient for many people. In this method, we write down the determinant twice,
except that the second time the last column of the IInd determinant should be omitted. It is
given below;
a11 a12 a13

Let A= a 21 a 22 a 23 , then
a31 a32 a33

Firstly, multiple along three lines falling to the right, giving all these products a plus
sign;
a11a22a33+a12a23a31+a13a21a32 ---------------------------------------------
(A)
Secondly, multiple along three lines falling to the right, giving all these products a minus
sign;
-a31a22a13-a32a23a11-a33a21a12 ---------------------------------------------
(B)
The sum of equation (A) and (B) is exactly equal to determinant A i.e ,| A|.
1.4 Basic Rules for Determinant
 If all the elements of any row or column are zero, the value of the determinant is
also zero, then, | A| = 0
 If we exchange all the rows of a determinant from columns and vice-versa, the
determinant remains unchanged in value and signs i.e. the value of a determinant
and its transpose remains same i.e.|A| = |A|T
 If we interchange any two rows or two columns of a determinant, its value remains
unchanged numerically but changes in sign.
 If a constant ‘c’ is multiplied (or divided) by all the elements of any one row (or
column) of a determinant, then the value of the determinant is also multiplied (or
divided) by ‘c’.
 If the elements of one row (or column) are identical/equal/proportional to the

elements of a second row (or column), the determinant takes the value zero.
 The determinant of the product of two n×n matrices A and B is the product of the
determinants of each of the factors;
AB  A . B
 If A be an n×n matrix and α be an real number, then;
A   n A
 An orthogonal matrix must have determinant 1 or -1.

 If A2 = I, then a square matrix of order n is called Involutive and its determinant is

always 1 or -1.
1.5Multiplication of Determinants
When we multiply two determinants of same order, the resultant determinant is a

determinant of same order. It is given by;
× =
4  2  5  3 4 1  5  2
Example: Let, × =
1 2  2  3 1 1  2  2
= =
1.6Adjoint or Adjugate and Cofactors of a Determinant
Suppose we have a determinant |A| and its adjoint is represented as |A|’ or Adj A.
It is also known as augmented matrix. The elements in |A|’ are the cofactors of the
corresponding elements of |A| , i.e.,
Let |A| = , then |A’| =
Where B1, C1, D1, …. are the respective cofactors of b1, c1, d1, … of determinant |A|.
For Example; let |A| = =
Now, B1 = (-1)1+1 = -2
B2 = (-1)2+1 = +1
B3 = (-1)3+1 = +4
Similarly, C1=4, C2= -2, C3=1 and D1=1, D2=4, D3= -2
Thus, |A’| =

Note: When |A| ≠ 0, we have |A’| = |A2|
1.7 Inverse Or Reciprocal Determinant
Suppose we have a determinant A, its inverse is represented as A-1. Provided the |A|
≠ 0, the inverse of A is formed by dividing every element of the adjoint of determinant A by
|A|.
So, if |A| = , then |A-1| =
= . |A’| = . |A2| = |A|-1
1.8 Symmetric Determinants
Suppose we consider the determinant |A| = , where the suffix values
indicate the position of its respective element (i.e. b 11 lies in the 1st row and 1st column and
c12 lies in 1st row and 2nd column and so on). The general formula for a determinant is |A|
= where arc are the elements of the determinant. A determinant is said to be
symmetric if arc = acr for all r,c = 1, 2, 3, …., n.
Thus, is a symmetric determinant.
Properties of Symmetric Determinant:
1. If we find the adjoint of a symmetric determinant, we see that its adjoint is also
symmetric.
2. If we square a symmetric determinant, the resultant determinant is also a symmetric
determinant.
1.9 Skew and Skew-Symmetric Determinants
Suppose we consider the determinant |A| = , where the suffix values
indicate the position of its respective element (i.e. b 11 lies in the 1st row and 1st column
and c12 lies in 1st row and 2nd column and so on). The general formula for a determinant
is |A| = where arc are the elements of the determinant. A determinant is
said to be ‘skew’ if arc = -acr for all r,c = 1, 2, 3, …., n and r≠c.
Thus, is a skew determinant.
And if arc = -acr for all r,c = 1, 2, 3, …., n and r ≠ c and arc =0 for all r = c, then the
determinant is known as ‘skew-symmetric’.
Thus, is a ‘skew-symmetric’ determinant.
Properties of Skew-Symmetric Determinant:
 Every determinant of 2nd order (mostly even order) which is skew-symmetric is a

perfect square.
For Example, let a skew-symmetric determinant |A| = ,
2
Thus |A| = 9 = 3 which is a perfect square.
 Every determinant of 3rd order (mostly odd order) which is skew-symmetric is zero.
For Example, let a skew-symmetric determinant |A| =
Thus, |A| = 0 – (-1) (6) + (-2) (3) = 0.
2.0 The Inverse of a Matrix

1
Suppose A be an non-singular matrix and the inverse matrix B  A exists such
that AB  BA  I , where I is an identity matrix of order same as A or B.
A matrix A is non-singular iff det( A) | A | 0 . In case of singular matrix i.e. |A| = 0,

the inverse does not exist.
 d b 
 
a b 1 1  d b  ad  bc ad  bc 
For Example; A    then A    
c d  ad  bc  c a   c a 
 
 ad  bc ad  bc 
The general formula for the inverse is given by;
Let A = (aij) be an n×n matrix with determinant det( A) | A | 0 and it has a unique
1 1 1
inverse A such that A A = A A=I, then;
1
A 1  . Adj ( A)
A
Properties of the Inverse: Let A and B are invertible n×n matrix, then;
1
 If A is invertible then ( A 1 ) 1  A
 If AB is invertible then ( AB ) 1  B 1 A1
 If the transpose A' is invertible then ( A' ) 1  ( A1 )'

 (cA) 1  c 1 A1 whenever c is a number ≠ 0
2.1 Finding Inverse by Elementary Row Operations

In this method, we can find the inverse of a matrix by row operation. It is known
as elementary matrix method. It is given by;
A I to I A-1 
It can be explained by the help of an example.
1 4 
Let A =   , then
2 7
1 4 1 0 
A I    
2 7 0 1
1 4 1 0
   2 R1  R2
0  1  2 1 
1 4 1 0 
  - R2
0 1 2  1
1 0  7 4 
  - 4R 2  R 1
0 1 2  1

 I A 1 
 7 4 
A-1 =  2  1
 
2.2 Cramer’s Rule and Simultaneous Equations
Using the properties of determinants, a simple method of solving linear simultaneous

equations was proposed by the mathematician, Gabriel Cramer.
Suppose, we have following linear simultaneous equation:
b1x + c1y + d1z= m1

b2x + c2y + d2z= m2
b3x + c3y + d3z= m3
Now we can rewrite this equation form in the form of determinant as follows:

× =
Now let |A| = ≠0
Let |B| =
Now,
x . |A| = x. =
= = |B|
Therefore, X ×|A| = |B|
Thus, x= .
Similarly we can solve for y and z values.
Let |C| = and |D| =
Thus, from above we can say:

y= and z = .
This is the process of solving simultaneous equations using Cramer’s Rule.
3x  2 y  1
For Example, Let be the Simultaneous Equations then
5 x  3 y  11
3 2 1 2
D  (9  10)  19 Dx   (3  22)  19
5 3 11 3
3 1
Dy   (33  5)  38
5 11
Dx 19 Dy 38
Now, x   1 and y   2
D 19 D 19

Example: Suppose there are two simultaneous equations:

a – 2b = 3 and 3a + 5b = 20
Find the values of a and b using Cramer’s rule.
Solution: We can write the above equations in determinants form as follows:
AX = B where A = ,X= and B =
Let A1 = and A2 =
According to Cramer’s Rule,
a= = = = 55/11 = 5
b= = = = 11/11 = 1
2.3 Homogeneous System of Equations

If the right hand side of the systems of equations consists only of zeros then this
system is called homogeneous. This system will always have the so-called trivial solution
i.e. X1 = X2 = X3 = -------------- = Xn = 0. On the other side some homogeneous system
has nontrivial solutions. The homogeneous linear system of equations has nontrivial
solutions if and only if the coefficient matrix A = (aij)nxn is singular, i.e., A  0.

2.4 Solved Examples
Example: Show that ( AB)1  B1 A1
Solution: We can consider that
AB( AB) 1  I
A1 AB( AB) 1  A1I  A1 , multiplying by A1
B 1 IB( AB) 1  B 1 A1 , using A1 A  I and multiplying by B 1
B 1 B( AB) 1  B 1 A1
( AB) 1  B 1 A1
Example: Consider the following macro-economic model;
Y = C + I0 +GO and C = a + bY
Find Y (National Income) and C (Consumption) by using Cramer’s rule.
Solution: Given;

Y – C = I0 +GO
-bY + C = a
Now, write the above equation in matrix form then
1 1
D  1  b  (1  b)
b 1
I o  Go 1
Dy   ( I o  Go )  (1  a )  ( I o  Go )  a
a 1
1 I o  Go
Dc   a  b( I o  Go )
b a
then
( I o  Go )  a
Y
(1  b)
a  b( I o  Go )
C
(1  b)
1 1 1
Example: Prove that A  a b c  (a  b)(b  c)(c  a)
a2 b2 c2
1 1 1
Solution: Given; A  a b c
a2 b2 c2
Applying C1 = C1 – C2 and C2 = C2 – C3, we get
0 0 1 0 0 1
A  a b bc c  (a  b)(b  c) 1 1 c
a2  b2 b2  c2 c2 a  b b  c c2
Now, expanding the determinant by the first row, we get
1 1
A  (a  b)(b  c)
ab bc
A  (a  b)(b  c)(b  c  a  b)  (a  b)(b  c)(c  a)
Example: Show that, A = [I - X(X’X)-1X’] is idempotent.
Solution: Idempotent means, A2 = A, then

A2 = [I-X(X’X)-1X’]2
= I2 + X(X’X)-1X’× X(X’X)-1X’- I× X(X’X)-1X’ - X(X’X)-1X’×I
= I + X(X’X)-1X’ - 2 X(X’X)-1X’
= [I - X(X’X)-1X’] = A Proved
Example: A monomial square matrix M is one in which there is exactly one non-zero entry
in each row and in each column. Show that any 2×2 monomial matrix is invertible
and describe its inverse.
Solution: A monomial square matrix M must be one of two types, i.e
 a 0  0 a 
 or   with a ≠ 0 and b ≠ 0
 0 b b 0
In both cases M  0, so M is invertible.
And its inverse is given by;
1 1
1  a 0 0 a  0 1/ b 
M    and     
 0 b b 0 1 / a 0 
Example: For what value of µ the following system of equations has non-trivial solutions?
5 x  2 y  z  x
2 x  y  y
x  z  z
Solution: Rewrite the given equation in standard form;
(5   ) x  2 y  z  0
2 x  (1   ) y  0
x  (1   ) z  0
The above system of equations has a nontrivial solution iff the coefficient matrix is
singular i.e. the determinant of coefficient matrix must be zero.

5 2 1
2 1  0 0
1 0 1 
Now, expanding the determinant, we get the value of determinant
µ(1-µ)(µ-6) = 0
Hence the system of equation has non-trivial solutions iff µ = 0, 1 or 6
Problem Set
  
1. Prove that 2  2
 2  (     )(   )(   )(   )
      
2. Solve the following system using both Cramer’s rule and matrix inverse:

5a  6b  4c  15
2 x  3 y  3 
a)  b) 7 a  4b  3c  19
4 x  y  11 2a  b  6c  46

2 x  3 y  z  12  0
 2 x  y  5
d) 3 x  4 y  11z  46 e) 
5 y  4 z  5 3x  2 y  3

X + 24 + 3z = 6
f) 2x + 4y + z = 7
3x + 24 + 9z = 14
3. Calculate the determinant of ,
4. Find the cofactors of the following determinant and prove that |A’| = |A 2|
|A| = .
5 3 5 2 
5. For a given matrix A  , the transpose is A    . A matrix A is called
2 4 3 4 
1 2 2
1
orthogonal if AA  AA  I . Show that the matrix A   2 2 2  is
3
 2 2 1
orthogonal.
 240 750 
1200 1500 
6. Given an input coefficient matrix A    and the demand matrix
 720 450 
1200 1500 
 210
D  . Find the output matrix X, such that ( I  A) X  D
330 
2 x  3 y  3
7. Find k so the system has no solution: 
kx  y  11
8. Show that the following system of equations has no solution;
x  2y  z  5
3x  y  z  2
x  5y  z  4
1
9. If A and B are the invertible then prove that BAB  A

3 5 
10. Find the inverse of the matrix A    and verify that A. A1  A1 . A  I
 7  11
 3 5  1 3
11. If A    and B    , then prove that ( AB ) 1  B 1 A1
2 7  2 4
12. Prove that the homogenous system of equations
ax  by  cz  0
bx  cy  az  0
cx  ay  bz  0
Has a nontrivial solution if and only if a  b  c  3abc  0
3 3 3
REFRENCES
 Knut Sydsaeter and Peter J. Hammond, Mathematics for Economic Analysis, Prentice
Hall
 Michael Hoy, John Livernois, Chris Mckenna, Ray Rees, Thantsis Stengos,
Mathematics for Economists, Addison-Wesley Publishers Ltd.
 Carl P. Simon and Lawrence Blume, Mathematics for Economists, London: W .W.
Norton & Co.

Linear Dependency And Rank Of Matrix
DC-1
Semester-II
Lesson: Linear Dependency And Rank Of Matrix
College/Department: Shyamlal College, University of Delhi
The essence of Mathematics is

not to make simple things
complicated, but to make
complicated things simple…
- S. Gudder

 2: Important terminology
 2.1: Minors
 2.2: Co-factors
 3: Linear Dependency
 4: Rank of A Matrix using Determinants
 5: Summary
 6: Exercise
 7: References
 8: MCQs
- Understand the concept of determinants and will be able to apply it to solve the
mathematical equations.
- Get to know the various properties of determinants and evaluate them.
- Know the steps and tools to calculate and solve determinants.
- Understand the different types of determinants and its various concepts.
- Solve the simultaneous equations using determinants applying Cramer’s Rule.
- Understand the concept of linear dependency and how to calculate rank of a matrix.
2. Important Terminology
2.1Minors:
In the above example of 3X3 determinant, we found that:
b1 - c1 + d1
This is determinant expanded in terms of “Minors” where,
Minor of b1 = which is a sub-matrix obtained by deleting the row and column
containing b1in the given 3X3 matrix i.e. .
Similarly, minor of b2 =

Minor of b3 =
Similarly, minor of c1 =
And minor of d1 = and so on.
Thus, we may define a (nxn) determinant in terms of “minors” as follows:
So we can say that the minor of any element in nth order determinant is a (n – 1)th order
determinant (where n=1, 2, …, ∞)
2.2Cofactors:
Suppose we have the following determinant:
|A| =
In general, we can represent a determinant A as:
|A| = , (wherer = row, c = column) and frc= cofactor of arcdefined as:
frc = (-1)r+cMrc , (where Mrc = minor of determinant A formed by eliminating row ‘r’ and
column ‘c’ from A).
Thus if we consider the above 3rd orderdeterminant ,
So, according to the formula, co-factor of b1 = (-1)1+1 = , as it lies in 1st

row and 1st column.

Co-factor of c12 = (-1)1+2 = (-) , as it lies in 1st row and 2nd column and
so on for other elements.
For Example, Compute the value and cofactors of the following determinant:
|A| =
Expanding the determinant from 1st row we get,
|A| = 1 -2 +3
= 1(6-3) -2(12-21) +3(4-14)
= 3 – (-18) + (-30) = -9
We can expand the determinant by any row or column, if we expand it by 1 st column,
|A| = 1 -4 +7
= 1(6-3) – 4(6-3) + 7(6-6)
= (3) – (12) + (0) = -9
COFACTORS:
Cofactor of element in 1st row and 1st column: (-1)1+1 = =3
Cofactor of element in 1st row and 2nd column: (-1)1+2 =- =9
Cofactor of element in 1st row and 3rd column: (-1)1+3 = = -10
And so on for other elements.
3. Linear Dependency:
LINEAR DEPENDENCE/INDEPENDENCE OF VECTORS – When a vector ‘x’ ( ) is some

combination of the other vector ‘y’ ( it is known to be a dependent vector.
Suppose, = and =
Two vectors and are known to be independent if a and b are the only numbers which
satisfies the equation ax+by = 0 i.e.
a +b = 0
a +b =0

a=b=0 is the only solution to this equation, given x and y are two independent vectors.
LINEAR DEPENDENCE/INDEPENDENCE OF IN TERMS OF

MATRICES/DETERMINANTS –
Two vectors and are known to be independent if the matrix, with these vectors as
columns ( ), has a non-zero determinant. Thus, if the determinant of a matrix is non-
zero, a set of n vectors, with n size, is linearly independent. And of course if the
determinant is zero, the set is dependent.
For example, two vectors <9, 2><5, 7> are linearly dependent since the determinant of
the matrix containing the two vectors as columns is non-zero.
We need to find the constants a and b such that a +b = 0 and a=b=0 is the trivial solution
for it.
a<9, 2> + b<5, 7> = <0, 0>
a +b =
9a + 5b = 0
2a + 7b = 0
Solving it we get a=b=0
Thus the vectors are linearly independent.
Alternatively, let matrix A = , thus
|A| = (63-10) = 53 ≠ 0
Linearly Independent
For example, show that the two vectors <2, -3> and <-10, 15> are linearly dependent.
We need to find the constants a and b such that their values are different from zero.
a<2, -3> + b<-10, 15> = <0, 0>
a +b =
2a - 10b = 0 or 2(a-5b) = 0
-3a + 15b = 0 or -3(a-5b) = 0
Thus, solving it we get a=5b. So if b=1, a=5, if b=1/10, b=1/2 etc.
Thus a≠b≠0 linearly dependent
Alternatively, Let matrix A = , thus

|A| = (30-30) = 0
Linearly dependent
For example, show that the following vectors are independent:
We need to find out the constants a, b, c such that we get the trivial solution a=b=c=0
Let, a +b +c =
a + b =0
a + c=0
b + c=0
Solving it simultaneously we get a=b=c=0
Linearly independent
Alternatively, Let A =
|A| = 1 (0-1) – 1(1-0) + 0(1-0) = -2
Linearly independent
For example, show that the following vectors are dependent and also find the relation
between them:
= <2, -1, 1> , = <3, -4, -2> , = <5, -10, -8>
Let matrix A =
Thus, |A| = 2(32-20) – 3(8+10) + 5(2+4) = 0
Linearly dependent
Now, to find the relation between the vectors x, y and z, we need to define the constants a,
b and c such that:
a +b +c =0

=0
Now, the augmented matrix is:
Now, through elementary row operation using the GAUSSIAN ELIMINATION we get,
Thus we get,
2a + 3b + 5c = 0
1b + 3c = 0
Thus, b = -3c and a = 2c

Now we know, ax + by + cz = 0
Thus, 2cx – 3cy + cz = 0
c(2x – 3y + z) = 0
2x – 3y + z = 0
-2x + 3y = z
Gaussian Elimination Method
Using the elementary row operations, after we obtain a triangular matrix, we can write the
associated linear equation and then try to solve it. This is called Gaussian Elimination.
Following are the steps involved in Gaussian Elimination for homogeneous system of Linear
equations:
1. Write down the given matrix in the augmented form.

2. Make the first non-zero main element (the first element) in all rows equal to 1.
3. Through elementary row operations, transform all the elements, below the main
elements, equal to zero (this would result in a triangular matrix).
4. Eliminate the row in which all the elements are zero.
5. If in a row, all the elements are zero except the last number, then the system has no
solution.
6. Now the triangular matrix becomes the associated augmented matrix. Write down
the new linear equations formed.
7. Solve the new equations. If there are some unknowns, parametric values can be
assigned to them and the new equations can be solved applying the method of back
substitution.
8. Total number of leading 1’s in the solution is the rank of the matrix.
EXAMPLE: Solve the following equation-

2x-3y+2z=21
x+4y-z=1
-x+2y+z=17
ANSWER: Write the above equation in an augmented form:
Using Gaussian method through Elementary Row Operations:
- Interchanging R1 and R2 we get,
- Adding 2xR3 to R2 we get,
- Adding R1 to R3 we get,
- Dividing R3 by 6 we get,
- Subtracting R3 from R2 we get,
- Multiplying R3 by (-1/4) we get,
We can again write this matrix in the form of linear equations:
x+4y-z=1
y+4z=55
z=13
Solving the above equations we get,

z=13; y=3; x=2.
Rank of the matrix is 3 as there are three leading ones, each corresponding to x, y and z.
Example: Solve, if possible, the system of equations
-x+3z=2
2x+y-4z=-1
x+2y+z=4
Answer: We can re-write the equation in matrix form as follows:
x =
Now, using Gaussian method:
The augmented matrix is:
- Subtracting 2R1 from R2 we get,
- Subtracting R1 from R3 we get,
- Multiplying R2 by (-1) we get,
Writing the above matrix in equation form again:
x+y-z=3
y+2z=5
Solving it we get,
Z=0

Y=5
X=-2
Since there are two leading ones in the above matrix. Rank of the matrix is 2.
4. Rank of A Matrix using Determinants
If we have a nxn matrix or a nth order determinant, and if the determinant value is zero, the
rank of the associated matrix must be equal to n. The number ‘r’ is defined as the rank of
matrix A if there is at least one (r x r) non-zero square sub-matrix (of A) determinant.
Thus to calculate the rank of a matrix, we see the maximum order of the minors of the
matrix which are non-zero.
Now, consider the following determinant and find its rank:
A=
-NOTE: We can discard the 5th column as all its elements are zero/null.
- NOTE: We can see that column 3 = column 1 + column 2. Thus we can also discard the
3rd column.
Now we have,
A=
, it is a 5X3 matrix
- Now we have to find out the maximum order of the minors of the matrix which are non-
zero. Even if there is at least one non-zero minor of a particular order determinant, we
consider that order to be the rank of the matrix.We shall start from the lowest nxn matrix or
nth order determinant
ORDER 1: all the non-zero elements of the matrix is the non-zero minor of 1st order
determinant. So we shall look at higher order determinant.
ORDER 2: =4≠0
Thus, we have a non-zero minor or non-zero square sub-matrix determinant of 2nd order
determinant, so we shall look at higher order determinant.
ORDER 3: = 3(-16) -1(-4) +1(0) = -44 ≠ 0

Thus, we have a non-zero minor or non-zero square sub-matrix determinant of 3rd order
determinant
We do not have a higher order determinant further, so the rank of matrix A = 3 which is the
order of the maximum non-zero square sub-matrix determinant.
NOTE:
- The rank of a null matrix i.e. =0
- The rank of a matrix is not more than that of the number of its rows or columns,
whichever is less.
- Unless the matrix is a null matrix, the rank of a matrix is at least 1.
- Rank of a matrix is equal to its transpose.
Example 1: Find whether the matrix is independent and has an inverse?
Answer: The row is not independent, if one row is a multiple of other, and the
determinant is also zero. In this case also, the matrix has two identical rows (and two
identical columns), thus the determinant is zero.
= (2-2) = 0
The matrix is thus singular and has no inverse. It is also dependent.
EXAMPLE 2: Find the rank of the matrix
Answer: Now we may consider that 4th row is a multiple of the 1st row. Thus, the rank of
the matrix is 3.
IMPORTANT:
* Rank of a matrix is defined by the number of independent rows/columns. The Gaussian

elimination method is used to calculate rank here.
* Rank is also calculated by finding the largest non-zero square sub-matrix. Determinants
can be used to find rank here.
* A line (forming a row or a column) can be eliminated if:
- The lines are identical or a multiple of each other.
- All the elements of the line are zero.
- A line has a linear combination with other lines.
- A line is proportionally related to other.
Example 3: Calculate the rank of the following matrix:

Answer: We can see that the 3rd column is a linear combination of the 1st and 2nd column.
Thus we can eliminate the 3rd column. Thus we get
We can use determinants to find out the largest non-zero square sub-matrix. Here the
largest square sub-matrix is (3X3) matrix. Thus, we check for all 3X3 sub matrix whether its
determinant is a non-zero.

Thus, determinant of all possible 3X3 sub matrix is zero, thus rank is less than 3. Now we
look at 2X2 sub-matrix:
=1≠0
Thus, the rank of the matrix is 2.
Example 4: Find the rank of the matrix A:
Answer: We can eliminate the 4th column as it is a multiple of the 1st column.
largest square sub-matrix is (3X3) matrix. Thus, we check for the 3X3 sub matrix whether
its determinant is a non-zero.
= 2(-2) -3(-1) +1(1) = 0
Thus, determinant of all possible 3X3 sub matrix is zero, thus rank is less than 3. Now we
look at 2X2 sub-matrix:
= -1 ≠ 0
Thus the rank of the matrix A [r(A)] is 2.
Example 5: Find out the rank of the following 4X4 matrix:
Answer: We can use determinants to find out the largest non-zero square sub-matrix. Here
the largest square sub-matrix is (4X4) matrix. Thus, we check for the 4X4 sub matrix
whether its determinant is a non-zero.

Thus, r(B) = 4
Finding the determinant of a 4X4 matrix:
3 [(-10-12)] -2 [-10 +10] +4 [(-8 -5+2)] -1 [(-6-5)]

-66 -44 +11
-99
Example 6: Find the rank of the following matrix c:
- We can eliminate the 3rd column as all its elements are zero.
- We can also eliminate the 5th column as it is proportional to the 1st one.
- We can see that c2 = -2 c1 + c4, thus c2 being a linear combination of c1 and c4, we can
eliminate c2.
Thus the resulting matrix is:

largest square sub-matrix is (2X2) matrix. Thus, we check for all 2X2 sub matrix whether its
determinant is a non-zero.
=1≠0
Thus, r(c)=2
Finding Rank Using Gaussian Elimination:
Example 7: Find the rank of the following matrix A:
Using Gaussian Elimination, we try to form an upper triangle of the matrix A which
eliminates the linearly dependent row vectors.
STEP 1: Add 2(R1) to R2, Subtract 7(R1) from R3 =
STEP 2: Add ½(R2) to R3
There are no more elementary row operations possible. Thus, eliminating the 3 rd row, we
can say that the rank of the matrix is 2. Thus r(A) = 2.
Points to note:
- Ranks of A and AT are same.

- Row space is the space of the row vectors of matrix A

- Column space is the space of the column vectors of matrix A
- Ranks of the row equivalent matrices are the same
- Row and column space of a matrix has same dimensions i.e. its rank
Three Types Of Solution Sets Of A Matrix:
A system of homogeneous equations can have three solutions i.e. A unique solution, no
solution or infinitely many solutions.
Unique Solution- When a system is consistent and the number of variables (unknown) in
the system is equal to the number of non-zero rows, the system has a unique solution.
No Solution- A system has no solution if it is inconsistent or the equations are

contradictory. For example, x+y=5 and x+y=7 has no solution
Infinite Solutions- When a system is consistent and the number of variables (unknown) in
the system is more than the number of non-zero rows, the system has infinite solutions.
Example, Consider the system of equations

kx+y+z=1
x+ky+z=1
x+y+kz=1
find all values of k for which the system of equation has;

a] a unique solution
b] more than one solution
c] no solution
Answer: We can re-write the equation in matrix form as follows:
x =
Let [A] = and [B] =
Writing the above matrix in augmented form we get: [A|B] =
Interchanging R1 and R3 we get,

Now, subtracting R1 from R2 we get,
Now, subtracting k.R1 from R3 we get,
Now adding R2 to R3 we get,
We can see that rank(A) = 2 as there are two leading ones.
Thus, we may conclude,
- If k= -2, then rank[A|B] = 3 as there will be three leading ones. Thus rank[A|B] ≠
rank[A]. Thus the system has no solution at k= -2.
- If k= 1, then rank[A|B] = 2 as there will be two leading ones. Thus rank[A|B] =
rank[A]. Number of variables=3 and number of non-zero rows =2. Thus the system
has many solutions at k= 1 as number of variables > number of non-zero rows
- If k ≠ 1, -2, then rank[A|B] = 3 = rank [A]. Number of non-zero rows is thus 3.
Thus, the system has a unique solution at k ≠ 1, -2 as number of variables =
number of non-zero rows.
5. Summary
Through this chapter we have been able to write the simultaneous linear equations in a
compact way and we have also used determinants to find the solution to those equations.
- A determinant of a 2X2 matrix is defined as product of two diagonal elements minus the
product of two off-diagonal elements of a matrix. For a matrix to be invertible/non-singular
its determinant should be different from zero.
det = = b1c2– b2c1≠ 0
- We can expand a determinant by any row or column and it will generate the same value of
the determinant every time.
- Determinant of 3 x 3 may be defined as:
b1 - c1 + d1
This is determinant expanded in terms of “Minors”
- In a determinant, a co-factor of any element is its coefficient in the expansion of the

determinant. Cofactor of arcelements in |A|is defined as:

frc = (-1)r+cMrc , (where Mrc = minor of determinant A formed by eliminating row ‘r’ and
column ‘c’ from A).
- Adjoint(’): the elements in A’are the cofactors of the corresponding elements of A.
- Using the properties of determinants, a simple method of solving linear simultaneous

equations was proposed by the mathematician, Gabriel Cramer.
Suppose we have following linear simultaneous equation:
bix + ciy + diz= mi (i = 1, 2, 3)
And |A| = ,|B| =
|C|= and |D| = , then
x= ,y= and z = .
- Two vectors and are known to be independent if the matrix, with these vectors as
columns ( ), has a non-zero determinant. Thus, if the determinant of a matrix is non-
zero, a set of n vectors, with n size, is linearly independent. And of course if the
determinant is zero, the set is dependent.
- The number ‘r’ is defined as the rank of matrix A if there is at least one (r x r) non-zero
square sub-matrix (of A) determinant.Thus to calculate the rank of a matrix, we see the
maximum order of the minors of the matrix which are non-zero.
6. Exercise
1. What do you mean by a determinant and how do you calculate it?
2. What is the rank of a determinant and what are the applications of determinant?
3. What are the properties and characteristics of determinants?
4. What do you mean by minors and cofactors?
5. What are the different types of determinants and explain their properties.
6. Calculate the determinant of .
7. Calculate the determinant of , also explain the properties of a determinant with
the help of it.
8. Suppose you are given the following simultaneous equations:
X + 24 + 3z = 6
2x + 4y + z = 7

3x + 24 + 9z = 14
Find out the values of x, y and z using Cramer’s rule.
9. Calculate the determinant and rank of the determinant .
10. Find the cofactors of the following determinant and prove that |A’| = |A2|
|A| = .
11. Define linear dependency/independency.
7. References
1.K. Sydsaeter and P. Hammond, Mathematics for Economic Analysis, Pearson

Educational Asia, Delhi, 2002
2. Carl P. Simon, Lawrence Blume, Mathematics for Economists
3. http://www.sheir.org/matrices-determinants-mcqs.html

Geometric Representation of Functions: Graphs and Level Curves
DC-1
Semester-II
Lesson: Geometric Representation of Functions: Graphs and Level
Curves
College/Department: Shyamlal College, University of Delhi
Life is Good for only two things:

discovering mathematics and
teaching mathematics…
- Simeon Denis Poisson

1 Learning Outcomes
2 Introduction
3 Points in Euclidean Spaces
3.1 Number Line
3.2 The Plane
3.3 Three Dimensions
3.4 Surface in A Space
4 Geometric Representation of Functions
4.1 Graphs
4.2 Level Curves
5 Differentiable Functions
6 Exercise
7 References
 Understand the concept of two variable functions and three-dimensional graph.

 Get to know the various properties of differentiable functions
 Know the steps and method to graphically represent a function in two dimensions lying in
a three-dimensional space.
 Understand the different types of three dimensional figures.
2. Introduction
Two-dimensional (2-D) geometric models are important to be studied as an undergraduate

economics course. In economic theory, mathematical analysis is used for the construction of
appropriate geometric and analytic generalizations of the 2-D geometric models. In this chapter,
we will discuss the basic geometry of coordinates, points and displacements in n-space. Several
economic variables are mathematically represented by various functions like demand function,
profit function, production function, etc. which are then analyzed using economics. We shall
study the complex/non-linear of several variables, in particular functions of two variables. We

will be able to visualize functions of several variables.
3. Points In Euclidean Spaces
3.1 The Number Line (R1)

The geometric representation of set of all real numbers is called the number line. Exactly
one point represents every real number on the line. One and only one number is represented by
those points. Following figure represents a part of a number line;
3.2 The Plane (R2):

Economic objects like consumption bundles are represented using pairs of numbers in
some of our economic examples. Cartesian plane or Euclidean 2-space is the geometric
representation of such pairs of numbers. It is denoted as R2. R2 can be shown drawing two
perpendicular number lines i.e. the x-axis and the y-axis. We can find points containing x and y
coordinates. For example, if we want to find a point ‘A’ with x-coordinate 2 and y-coordinate 3.
It can be drawn as follows:
Thus, A is the point (with 2 and 3 as x and y coordinates respectively) in the plane with
an ordered pair.
Similarly we can show different points A, B, C and D in R2 as follows;
3.3 Three Dimensions and its geometric representations

As we visualized the 2-dimensional Euclidean space R2 by drawing two perpendicular
number lines x-axis and y-axis, similarly 3-dimensional R3 can be represented by drawing three
perpendicular lines x-axis, y-axis and z-axis. Generally, x-axis is the horizontal axis and y-axis is
the vertical axis on the plane of the page. Now, we draw the third axis i.e. z-axis as follows;
We can use these number lines to find a point with a particular triple of numbers. We can
plot the points in the same way as in R2. Now, ignoring the coordinate p, we can plot coordinates
q and r easily on x and y axis as we did earlier. Now, from the point (q,r), move A units in the
direction parallel to z-axis. Move forward of the plane; if A is positive and move behind the
plane if A is negative. Remain still if A is 0. We finish at a point (p,q,r). Now, we can then take
y-axis and z-axis and move B units to the right if B is positive and left if B is negative. Similarly,
in x-axis and z-axis, we can move C units up if C is positive and move down if it is negative. We
see that whichever method we use, we end up at same point (p,q,r).
The diagram below shows the three dimensional figure with coordinates p, q and r where
the coordinates q and r are found using 2-space technique. We have already seen how moving
parallel, up, down, right, left, from point (p,q,r) we form a three-dimensional figure with
different coordinates as seen in the figure below.
We have thus seen how to plot R1, R2 and R3 using Euclidean space. R1 consists of single
real numbers, represented by a number line. R2 consists of ordered pairs, represented by a graph
i.e. a point set in a plane and R3 consists of ordered triples, represented by a graph i.e. a point set
in 3-space, forming a surface in a space. Thus, Euclidean n-spaces consist of n-tuples of numbers
i.e. ordered list of n numbers. Thus, Euclidean n-space is represented as Rn. The number n, called
the dimension of Rn, is used to describe how many numbers are required to describe each
location, for example, R3 has three dimensions.
Thus,
Points in three-dimensional space have three coordinates as shown in the figure above.
- The Z-coordinate of a point (p) is its distance in front of the XY-plane.

(If the Z-coordinate is negative, the point is behind the XY-plane.)
- The X-coordinate of a point (q) is its distance to the right of the YZ-plane.
(If the X-coordinate is negative, the point is to the left of the YZ-plane.)
- The Y-coordinate of a point (r) is its height above the XZ-plane.
(If the Y-coordinate is negative, the point is below the XZ-plane.)
3.4 Surfaces in a Space and its geometric representation

Thus, we can see that the figure above consists of ordered triples (p,q,r), represented by a
graph i.e. a point set in 3-space, forming a 3-dimensional surface in a space. The following
equations define three simple cases:
(i) z=p, (ii) x=q, (iii) y=r; which are the only requirements on the variables mentioned.
Thus, we can see that the point (p,q,r) in space which satisfies z=p (with no requirements on x
and y) lie in the plane indicated in the figure below:
Similarly, the figures below show the pieces of other two equations:
EXAMPLES
1) SPHERE: let us consider the equation a2 + b2 + c2 = 16. Now,
a2 + b2 + c2 = (a2-0) + (b2-0) + (b2-0) is a square of the distance to the point (a,b,c) from
the origin (0,0,0). Thus, the equation a2 + b2 + c2 = 16 consists of those points (a,b,c)
whose distance is 4 from the origin. Thus, it represents a sphere with radius=4 and centre
as (0,0,0).
2) BUDGET LINE: let us consider a budget equation ax+by+cz=m (representing a budget

plane), where m is the income of a consumer which he spends on buying three goods. Let
x, y and z be the units he consumes of the three goods respectively and a, b and c be the
respective per unit prices of goods that he consumes. Thus, the total cost of buying the
three goods is ax+by+cz. The equation ax+by+cz=m is satisfied only by the points (x,y,z)
which can be bought if total expenditure is m. Now, it represents a triangle with three
vertices i.e. A = (m/a,0,0), B = (0,m/b,0) and C = (0,0,m/c).
Functions between Euclidean Spaces

We generally denote a function from set X to set Y as f:X→Y, which is a rule that
assigns one and only one object in Y, to each object in X.
EXAMPLE: Suppose f(x,y)=x2 + y2 which defines f:R2→R1. The image of f is set of all non-
negative real numbers. The target space of f is R1 and the domain of f is all of R2.
EXAMPLE: Suppose f(x) = 1/x. The domain of f(x) is all real numbers except 0. It has the same
image as domain i.e. R1 – {0}.
4. Geometric Representation Of Functions
4.1 Graph of A Function of Two Variables
Suppose in a set A in the xy-plane, a two variable function is represented by Z = f(X,Y). To

construct the graph, we find f at (X,Y), for each value (X,Y) in the domain. Now, we find the
point (X,Y,f(X,Y)) in R3. We sweep out the two-dimensional graph completely if we continue
this for all (X,Y). If we plot the graph of the function Z = f(X,Y), in two dimensions, that lies in
a three-dimensional space, it turns out to be a smooth surface in space as follows:
However, graphical representation of a function in two dimensions lying in a three-

dimensional space is difficult. Thus, we may adopt another method that makes quantitative
measurements easy.
4.2 Level Curves For Arbitrary Functions (Z=f(X,Y))

Map makers draw contours to get an idea about altitude variations on earth’s surface; like
closer the contours, steeper the slope. For example, they draw contours or level curves
connecting points on the map representing places on earth’s surface at same distance (eg. 100
meters) above the sea level. We can apply the same concept for geographical representation of
arbitrary functions Z=f(X,Y). We have already seen that the graph of functions in a three-
dimensional space seems as being cut by horizontal planes parallel to XY-plane. This
intersection onto the XY-plane is known as the level curve for height ‘a’ for ‘ f ’, if the
intersecting plane is z=a. This level curve consists of points that satisfy the equation:
f(X,Y) = a
EXAMPLES
1) What are the level curves for the equation z = f(x,y) = x2+y2. Represent it graphically.
Solution: We know that Z will always be positive i.e. z ≥ 0. The equation of the level curve is:
x2+y2 = a ≥ 0
From this equation, we may say that these are circles in the XY-plane with radius and
centre at the origin.
Graphical representation of Z=f(X,Y) and one of its level curves:

Now, we know that all the level curves are circles. Z=Y2, if X=0, thus it is a parabola in
the YZ-plane. Similarly, Z=X2, if Y=0, thus it is a parabola in the XZ-plane. Thus, we got the
above figure by rotating the parabola Z=X2 around the Z-axis. The surface is a paraboloid, as
shown in the figure below:
2) Give the geometric representation of the two variable function Z=f(X,Y)=Y2-X2.

Solution: The equation of the level curve is: y2-x2 = a
Now we know, if X=0, Z=Y2 i.e. a parabola in the YZ-plane. Similarly, if Y=0, Z= - X2 is
an inverse parabola in XZ-plane. Now, after plotting the graph putting different values of Y in
the XZ-plane and continuing the process, we get a saddled-shape graph as follows:
3) Suppose a firm’s production function is given by Q=f(K,L)=C; where C is a constant and

K and L are capital and labor inputs respectively. Represent it geometrically.
Solution: The level curve for the above production function in the KL-plane is called an iso-
quant. Graphically, we can see the iso-quants of the production function Q=K.L as follows:
A Cobb-Douglas production function f(K,L) = AKαLβ; where A>0 and α+β<1

representing diminishing returns to scale, have level curves similar to those in the figure above.
Graphically, we can see:
For the function f(a,b) =

4) ,
Show that all the points (a,b) satisfying ab=3, lie on a level curve.
Solution: Substituting the restriction ab=3 in the function f we get,
Thus, value of f(a,b) is a constant 3/5, for all values (a,b) where ab=3. Thus, ab=3 lies on a
level curve for f, at height 3/5.
5) The function Z= is a hemi-sphere above the XY-plane centered at origin

and unit radius.
6) The functions Z=XY and Z=X2-Y2 are so-called hyperbolic paraboloids.
7) The linear function Z=aX+bY+c has a plane in space for its graph.
5 Differentiability of Functions Of Several Variables

In a function of several variables, if we assign different numerical values to all the variables
except one i.e. suppose in a two variable function Z=f(X,Y), if we only X to vary (keeping value
of Y constant), the function becomes a one variable function. Suppose we fix the value of Y=y0,
and differentiate the function Z at X=x0, we obtain the partial derivative of Z w.r.t. X at the
point (x0,y0). It is represented as ∂z/∂x. Similarly, we can find the partial derivative of Z w.r.t. Y,
keeping X constant.
EXAMPLES: Find the partial derivatives of the following functions;

1) Suppose Z=f(X,Y)=Y2-X2
∂Z/∂X = -2X (keeping Y2 as constant)
∂Z/∂Y = 2Y (keeping X2 as constant)
2) Suppose Z=Y2+XY
∂Z/∂X = Y (keeping Y as constant)
∂Z/∂Y = 2Y+X (keeping X as constant)
3) Z=X2Y
∂Z/∂X = 2XY (keeping Y as constant)
∂Z/∂Y = X2 (keeping X2 as constant)
4) Z=X3+Y2+X2Y
∂Z/∂X = 3X2+2XY
∂Z/∂Y = 2Y+X2
5) Suppose, Z=f(X,Y) and
We can find the partial derivatives of the function Z but the function is not differentiable at the
point (0,0).
** Thus, we can say that if there exists partial derivatives of a function, which is also continuous
at a neighboring point, then the function is differentiable at that point.
6) Suppose, Z=f(X,Y) and
We can find the partial derivatives of the function Z but the function is not differentiable at the
point (0,0). Thus, the function is not differentiable.
7) Find if the function a is differentiable: a=b2c+5b2c5d-bc+5d
∂a/∂b = 2bc+10b c5d-c
∂a/∂c = b2+25b2c4d
∂a/∂d = 5b2c5+5
Thus, all the partial derivatives exist. Also we can see that the function a is defined and
continuous for all the values of b,c and d. Thus, the function is differentiable.
8) Find if the function a is continuous:
a=
The function is not defined if the denominator is zero. We can see that the function a is defined
for all values (X,Y) except the points lying on the circle X2+Y2=4. Thus, the function is not
continuous.
6 Exercise
1) Prove that the level curve of the function z, where
Z=
has level curves centered at origin. Also show that X2+Y2=6 is a level
curve of the function Z.
2) Plot the graph for Z=
[Hint: we can trace it as absolute value functions and by staring at it we see that Z=|r|, thus it is
a cone.]
3) Plot the graph for Z=X2-Y2. [Hint: see section 4.2, example 6.]
4) Plot Z=
Hint: The contours can be plotted as follows:
We know that this is also an indifference curve. Thus we can plot the three-dimensional
figure from it.
5) Find the first partial derivatives of the following functions Z=f(X,Y):

6) Find the first partial derivatives of the following functions Z=f(X,Y):
(a) XY
(b) Log XY
(c) XY
(d)
7) How would you use the graph of Z=f(X,Y), to draw level curves of f?
8) Geometrically plot the graphs of the following functions:
(a) Z=5-X-Y
(b) Z= - X2 – Y2
7 References
K. Sydsaeter and P. Hammond, Mathematics for Economic Analysis, Pearson Educational Asia,
Delhi, 2002
Carl P. Simon, Lawrence Blume, Mathematics for Economists
Higher Order Differentiation and Its Applications
DC-1
Semester-II
Lesson: Higher Order Differentiation and Its Applications
College/Department: Department of Economic, P.G.D.A.V College,
University of Delhi
1
CONTENTS
2. Higher Order Differentiation
3. Partial Derivatives
3.1Higher order partial derivative
3.2Partial derivative with many variables
4. Quadratic Forms
5. Exercise
6. References

2. Examples of Higher Order Differentiation
3. Partial Differentiation
4. Examples of Partial Differentation
5. Clairant Theorm/Young’s theorm
If f (x) be differentiable function of x, then f'(x) or is the first derivative or first order
derivative of y = f (x) w.r.t ‘x’. Since the derivative of function is also a function, therefore
another derivative can also be find. The second order derivative, or second derivative, is the
derivative of the first derivative of the function f(x). Other notations are:
or or or f '' (x)
Since f '' (x) is also a function, therefore, its derivative can also be find which is denoted as f '''
(x). For higher order derivatives superscripts can be used i.e. f4 = fourth derivative etc.
Example: -f(x) = 4x5 + 6x3+2x+1
f''(x) = 20x4 + 18x2 + 2

2
f''(x) = 80x3 + 36x
f'''(x) = 240x2 + 36
3. Partial Derivatives
Given a function y = f(x), the derivative f '(x), represents the rate of change of the function as x
changes. For a function of two variables, such as z = f (x,y), one variable could be changing
faster than the other variables. It will be completely possible for the function to be changing
differently.
For a function of two independent variables, z = f(x,y), the partial derivative ‘z’ with respect to x
may be found as normal rule of differentiation. The only difference is that, whenever or
wherever the second independent variable ‘y’ appears, it will be treated as constant in every
respect. Also the partial differentiation of y can be found by treating x variable as constant.
Notations of partial differentiation are given below:
Notations of Partial Differentiation
Partial derivative of z w.r.t x fx
Partial derivative of z w.r.t y Fy
Example: Z = x4 y2 – x2 y6
= 4x3y2 – 2xy6
= 2x4y – 6x2 y5
Partial derivative can be defined as:-
If z = f(x,y) is a function of two variables, then and
, called partial derivatives of z with respect to x and y respectively, be the derivative z w.r.t.x
3
by keeping y as constant and the derivative z w.r.t.y by keeping x as constant. All the rules of
differentiation can be applied when partial differentiation can be calculated.
Symbolically if f =f (x,y) then
fx = =LtΔx →0
fy = = LtΔy →0
provided these limits exist.
Example 1.z = f (x, y) = x3y+x2y2 +xy +x+y2
= 3x2y + 2xy2 + y + 1
= x3 + 2x2y + x +2y
3.1 Higher order partial derivative
For a function z = f (x, y); fˈ (x) & fˈ (y) are the two first order partial derivatives with respect to
x and y respectively. Since ‘z’ is a function hence fˈ (x) and fˈ (y) are also a function, hence,
second order partial differentiation can also be found.
The second order partial derivatives are called mixed partial derivative because derivatives of
more than one variable are to be observed. e.g differentiating a function with respect to ‘x’ first
and then ‘y’ is called as mixed partial derivative. The various notation of partial derivative are
given in table:
Notations of second order partial derivatives
Partial derivatives of z Notation 1 Notation 2 Notation 3

w.r.t. x twice ( ) fxx
4
w.r.t. y twice ( ) fyy
w.r.t. x first than y ( ) fxy
w.r.t. y y first than x ( ) fyx
A function has four possible second partial derivatives ones that are obtained by differentiating
function w.r.t ‘x’ twice, w.r.t. y twice, w.r.t. x first than y and w.r.t. y first then x. All derivatives
have sign (+ or -) interpretation of these signs are as follows.
Partial Sign Interpretation

derivative
+ Slopes in x direction is positive
- Slopes in x direction is negative
+ Slopes in x direction increases as x increases(y constant)
- Slopes in x direction decreases as x decreases(y constant)
+ Slopes in y direction is positive
- Slopes in y direction is negative
+ Slopes in y direction increases as y increases(x constant)
- Slopes in y direction decreases as y decreases(x constant)
+ Slopes in x direction increases as y increases(x constant)
- Slopes in x direction decreases as y decreases(x constant)
+ Slopes in y direction increases as x increases(y constant)
- Slopes in y direction decreases as x decreases(y constant)
Example z = x0.5 y0.5 – 60 find interprate the result
sol. = 0.5 x0.5 y0.5
5
= (0.5 x0.5 y0.5)
= (0.5) (-0.5 x-0.5 y0.5)
= - 0.25 x-1.5 y0.5
Since x and y are positive, positive number raised to any power is positive; hence y0.5 and x-1.5
are positive , the term -0.25 in equation show that second order differentiation of z with respect
to x twice is negative meaning that the slope in the x direction decreases as x increases when y is
constant.
Example : z = f(x, y) = x3y +x2 y +2x + xy + x + y2
= x3 + 2x2y + 2xy2 +y +1
fy= x3 + 2x2y + x + 2y
fxx= 6xy +2y2
fyy = 2x2 + 2
fxy = 3x2 + 4xy +1
fyx = 3x2 + 4xy +1
.: fxy = fyx .
The two mixed second order partial derivatives (also called as cross partial derivatives) are
always equal when fxy and fyx are continuous. It is explained by the following theorem given by
Alexis Clairant also know as Young’s theorem.
Theorem: Suppose f is defined on a disk D, which contains the point (a, b). If the partial
derivatives fxy and fyx are both continuous on disk D, then
fxy (a, b) = fyx (a, b).
Example:- Verify Young’s theorem f(x,y) = x
Solution:-
6
fx(x,y) = -2 x2 y2
fy(x,y) = -2 x3 y
Now, compute the two mixed partial derivatives.
fxy(x,y) = -2 x2 y -4 x2 y +4 x4y3
= -6 x2 y +4 x4y3
fyx(x,y)= -6 x2 y +4 x4y3
.: fxy = fyx
Hence proved.
3.2 Partial derivative with many variables
If z = f (x1, x2 ……. xn) then
is the differentiation of the function w.r.t. xi when all the other variables xj (j≠i ) are held
constant.
i.e =
and = and so on
Suppose, there is a function which consists of three variables v = f (x, y, z). For such a function,
there are partial derivatives of w.r.t x, y and z. When partial derivative has to take with respect to
one of x, y and z assuming other two independent variables are constant.
In general, function consists of n variables. If Z = f (x1 x2,….,xn) then partial derivative of z
w.r.t. xi is when all the other variables xj (j≠i) are held constant.
fxi = = Lth →0
7
provided limit exists.
Example:- f (x, y, z) = x2 +y3 +z4
fx = = 2x
fy=3y2
fz = 4z3
Example: Find Zxxx, Zxyx, Zyyy,Zyxy of the function
Z= 3x2(5x+7y)
Zx= 3x2(5) + (5x+7y)(6x)
= 45x2 + 42xy
Zxx= 90x+42y
Zxxx= 90.
Zxy = 0 + 42x
= 42x
Zxyx = 42
Zy= 3x2(7) + (5x+7y)(0)
=21 x2
Zyy = 0
Zyyy = 0.
Zyx = 42x
Zyxy = 0.
Example: Find Zxxx, Zxyy, Zyyy,Zxxy of the function Z= ( 9x – 4y)(12x + 2y)
8
Zx = (9x – 4y)(12) + (12x + 2y) (9)
= 108x – 48y + 108 x + 18 y
= 216 x – 30 y
Zxx = 216
Zxyy = 0
Zxxx = 0
Zxy = -30
Zxyy = 0
Zy = (9x-4y) (2) + (12x +2y) (-4)
= 18x – 8y -48x -8y
= -30x -16y
Zyy = -16
Zyyy =0
Example: Find Zxx, Zyy of the function Z =
Zx =
Zxx= 0
Zy =
= =
Zyy = - (-2xy-3) = xy-3
9
Clairaut theorem (Young’s theorem) can be extended to any function of ‘n’ number of variables
and their mixed partial derivations. The only thing has to remember that in each derivative, we
differentiate with respect to each variable the same number of times.
For three variables, according to clairaut theorem,
fxz (x, y, z) = fzx (x,y,z)
provided with the derivatives are continuous.
The partial derivative is approximate equal to the change in function.i.e,
fi(x1,….xn)≈ fi(x1,…,xi-1, xi+h, xi+1…..xn)- fi(x1,…,xi-1, xi, xi+1…..xn)
There are n partial derivatives of first order. For each of the first order partial order derivative of
the function, there are n second order derivatives. i.e.,
( )= =fxixj (i=1..n;j=1..n)
So, total n2 elements are there. Therefore, n*n matrix of second order partial derivative is the
Hessian matrix which is symmetric and all f11=f22=….=fn(Clairant theorem)
Example:
If the two demand functions for the two commodities are given by
x= y=
then the marginal demand functions are
= =
10
= =
Since ≥0 and ≥0, therefore, two commodities are competitive.
Example:
If the demand functions for two related commodities are given by
x=ae-pq and y=bep-q where a ≥ 0 b ≥ 0
Solution: Since two demand functions are given as
x=ae-pq
y=bep-q
their marginal demand functions can be calculated as:
= -aqe-pq = bep-q
= -ape-pq = -bep-q
Because ≤ 0 and ≥0, therefore the given commodities are neither competitive nor
complementary.
Example
Consider two products, A and B. the demand for good A and B, & described by following two
equations
qa =
qb = find q/ q given the result explain A and Bare complementary or substitutes.
Solution
11
qa = =
= 200( ( pb-1/2) /pa)
= -100
=-100 pa-1 pb -3/2
qb =
= =( )
= ( (pa-1/3)
= ( )
= ( pa-1/3-1)
=
pa-4/3 . pb-1
We know that pa and pb are positive because prices can never be negative therefore:
=-( ) = -( )<0
=- =-( )<0
Because both cross elasticities are – ve
Example: Two goods are complement goods.

The Stone-Geary Utility function is written as u= log U = β1 log (q1 – γ1) + β2( q2 – γ2), where u is
the utility index, qi is the quantity of commodity i, 0 < i < 1, i > 0, qi - i > 0 and i = 1,2.
a. Find the marginal utility of this function with respect to q1 and determine its sign.
12
b. What is the significance of a positive marginal utility?

c. Find the second derivative of this function with respect to q1. Does the utility function
exhibit diminishing marginal utility?
Solution: Utility function is: u= log U = β1 log (q1 – γ1) + β2( q2 – γ2)
a. Marginal utility is given by : = ; which is greater than zero; because both the
numerator and the denominator are positive.

b. Since marginal utility is positive; this implies that as utility increases monotonically with
increase in q1.
c. ; which is less than zero. Since the second derivative is negative, the
utility function exhibits diminishing marginal utility.
Example: Given the production function:
P(L,K)=5L1/2 K1/2 +L.
Find out the partial elasticity with respect to labor at (L,K)= (1024,27).
Solution: P(L,K)=5L1/2 K1/2 +L
ϵL= PˈL (L,K).L
= (L-4/5 K-1/3 +1)
Therefore, at (L,K)=(1024,27) we have ϵL= =
This explains that if capital remains constant at K=27 and at L=24 labour increases with 1
percent, then output will increase by percent.
Example: Utility function is given:
13
U = U= X0.5Y0.5.
Calculate the marginal rate of substitution between X,Y.
Function is U= X0.5Y0.5.
First, take the partial derivative of U with respect to X to get MUx.
MUx= = 0.5X-0.5Y0.5.
Next, take the partial derivative with respect to Y to get MUy.
MUy= = 0.5X0.5Y-0.5.
Dividing MUx by MUy we get
MRS= = = y/x
Example: Given an isoquant

Q = K1/6L1/2
Find out slope of isoquant.
Solution: Slope of Isoquant =
=- /
= K1/6L-1/2
= K-5/6L1/2
= −( K1/6L-1/2)/ ( K-5/6L1/2)
=- K1/6 K5/6L-1/2L-1/2
=-3(K/L)
Therefore, the slope of isoquant is 3(K,L).
Example: Given demand function Q- 90+ 2P=0; and average cost function
AC= Q2- 39.5Q+ 120+ 125/Q
14
Calculate the level of output where:

(a) total revenue is maximum,
(b) marginal cost is minimum,
(c) profits is maximum.
Solution: (a) The demand function is Q- 90+ 2P=0.
Written as 2P=90 - Q
P= 45 – 0.5Q
TR= PQ= (45 – 0.5Q)Q
=45Q – 0.5 Q2
For maximizing TR, first-order condition is:
=45 – Q=0
Q=45
and second-order condition is, = -1<0.
Therefore, at Q=45, TR is maximized.

(b) AC= Q2- 39.5Q+ 120+ 125/Q
TC= AC.Q= (Q2- 39.5Q+ 120+ 125/Q)Q
= Q3- 39.5Q2+ 120Q+ 125
MC= = 3Q2 -79Q+ 120
MC is minimum when, =0 and
= 6Q -79=0
Q= 13.167
And, =6>0.
Hence, at Q= 13.167, MC is minimum.

(c) Profit (π) = TR- TC
=45Q – 0.5 Q2 – (Q3- 39.5Q2+ 120Q+ 125)
= -Q3 + 39 Q2 - 75Q- 125
For maximization of profit, first order condition
= - 3Q2 + 78Q- 75=0
15
(-3Q + 3)(Q – 25) =0

Q=1 and Q= 25
and for second order condition,
= - 6Q + 78
When Q = 1 then,
= 72>0.
When Q=25 then,
= -72<0.
Therefore, profit is maximum when Q=25

Maximum = -(25)3+ 39 (25)2-75(25) -125= 6750.
Example: Two different demand functions are given:

Q1= 21 – o.1P1 and Q2 = 50 – 0.4 P2
TC = 2000 + 10 Q where Q= Q1+ Q2, what price will the firm charge (a) with discrimination and
(b) without discrimination between markets?
Solution: since demand function in first market is, Q1= 21 – o.1P1
Therefore, P1= 210- 10 Q1
And, TR1= P1Q! = (210-10Q1)Q1 = 210Q1 – 10 Q12
MR1= = 210- 20Q1
Profit is maximum when MR= MC,
MC = = 10
MR1= MC
210- 20Q1=10
Q1= 10
When Q1= 10, P1 = 210- 10(10)= 110
demand function in second market is, Q2= 50 – o.4P2
hence, P2= 125 – 2.5 Q2
TR2= (125 – 2.5 Q2)Q = 125Q – 2.5 Q22
16
MR2= = 125- 5Q2
When MR2=MC
125- 5Q2=10
Q2= 23
When Q2= 23, then P2= 125 – 2.5(23)= 67.5
The discriminating monopoly charges a lower price in the second market where the demand is
relatively more elastic, and a higher price in the first market where the demand is relatively less
elastic.
Example: A producer is a price-taker on both the market for input factors labor and capital, and
the market for end products. The cost of one unit of labor equals w = 2, the cost of one unit of
capital equals r = 32 , while the selling price of the end products equals p =32. The production
function of this producer is given by Y(L,K) = L1/8 K1/2. Determine the maximum profit.
Solution:
The revenue function is R(L,K) = p.Y(L,K) = 32. L1/8 K1/2
Cost function
C(L,K) = wL + rK= 2L+32K, and
Hence, profit function becomes
Π(L,K) = 32 L1/8 K1/2 – 2L – 32K
Partial derivative of π(L,K) is given by:
= 4L-7/8K1/2 -2 and
= 16 L1/8K-1/2 -32
the stationary points of profit function are solutions of the following system
4L-7/8K1/2 -2 = 0
16 L1/8K-1/2 -32 = 0
Hence, K1/2=1/2L7/8 and
therefore, K = ¼ L14/8
Consequently, L1/8(1/4 L14/8)-1/2= 1
which gives L=1 and therefore, K=1/4
Hence, (L,K) = (1,1/4) is the only stationary point. By the use of the criterion function we
investigate whether or not this point is a maximum location.
= -3.1/2L-15/8K1/2;
17
= -8L1/8K-3/2 and
= 2L-7/8K-1/2, which implies that the criterion function is given by
C(L,K) = . -( )2
= (-3.1/2L-15/8K1/2)(-8L1/8K-3/2)- (2L-7/8K-1/2)2
= 28 L-14/3K-1 – 4L-14/8K-1
= 24 L-14/8K-1>0
Hence, as C(1,1/4) >0 and (1,1/4) <0 it follows that has a maximum profit
at (L,K)= (1,1/4), with value π=6.
4. Quadratic Forms
A quadratic form of two variables is
f(x,y) = ax2 + 2bxy +cy2;
a,b,and c are constants. Now, using matrix notation:
f(x,y) = (x,y)
= 2a, = = 2b and = 2c are the second order partial derivatives of the function f(x,y)
Therefore, the Hessian of f is given by
The given quadratic form is said to positive definite if f(x,y) >0; for all values of x and y i.e,
(x,y) ≠ (0,0), and positive semidefinite if f(x,y)≥ 0 for all values of (x,y). The given function is
negative definite if f(x,y)<0; for all values of x and y; and it is negative semidefinite if f(x,y)≤0.
And it is indefinite we have two different pairs of x and y; (x-,y-) and (x+,y+); and also f(x+,y+)
>0.
Example: Express the quadratic form below as a matrix form. Determine the definiteness of the
equations:
a) f(x1,x2) = 4x2 +8xy +5y2
b) f(x1,x2) = -x2 +xy - 3y2
Solution: a) f(x,y) = (x,y)
Therefore, symmetric matrix is , whose determinant is positive. Hence,f(x,y) > 0 for all
values of x and y. Therefore, the quadratic form is positive definite.
18
f(x,y) = (x,y)
Therefore, symmetric matrix is , whose determinant is negative. Hence,f(x,y)<
0 for all values of x and y. Therefore, the quadratic form is negative definite.
5. Exercise:
1. Find the second – order partial derivatives fxx, fyy and fxy for each of the following
functions:
(a) Z=
(b) Z = (7x +3y)3

(c) Z= (x3+ 5y)5
(d) Z = (2x+5y)ey
(e) Z=log (1+ x2) +y2
(f) Z = 3x2e2y
2. Consider the function: f ( x1 , x2 )  (3x12  5x1  1)  ( x2  4) .
a. Find f1 and f2.
b. Find f11, f22, f12 and f21.
3. Assume the demand for sugar is a function of income (Y), the price of sugar (Ps) and the
price of saccharine (Pc), a sugar substitute, as follows:
Qd  f (Y , Pc , Ps )  0.05Y  10 Pc  5Ps2 .
a. Find the partial derivatives of this demand function.

  Qd Y 
b. Find the elasticity of demand with respect to income    when Y = 10,000, Ps =
  Y Qd 
5 and Pc = 7.
  Qd Ps 
c. Find the own-price elasticity of demand    when Y = 10,000, Ps = 5 and Pc = 7.
  Ps Qd 
  Qd Pc 
d. Find the cross-price elasticity of demand    when Y = 10,000, Ps = 5 and Pc = 7.
  Pc Qd 
4. Show that fxz = fzx and fxzz = fzxz = fzzx from the following function:
19
F(x,y,z) = y
5. The demand function of two related commodities are given by
X1= p1-1.7p20.8
X2= p10.5p2-0.2
What can you say about the two commodities X1 and X2 and also find all partial elasticties.
6. A firm produces two commodities: commodity X and commodity Y. the demand

functions are:
p1 =8 – 2x
p2 =14 –y2
The combined cost of production of these unit is given by C = 10+4x+2y. What will be
the prices of two products so that joint profit will be the maximum.
1 1
7. Consider a production function that takes the form y  10 L2 K 2 , and assume that capital
(K) is constant at K0 = 64.
y
a. Find the marginal product of labor, .
L
b. If the labor were paid real wage equivalent to the marginal product of labor, how many
labors would be employed when the going wage rate is 10?
c. What happens to the number of labor demanded when the wage declines to 8?
d. How many labor would be demanded if wage remains 8, but the capital is increased to
100?
 2y
e. Find the cross partial derivative, .
 K L
8. Example: Two different demand functions are given:
Q1= 11 – 2p1-2p2 and Q2 = 16 – 2p1- 3p2
TC = 10 +4x+2y. Determine the quantities that maximize the profit of monopolist and also find
maximum profit.
9. Two different demand functions of discriminating monopoly are given:
p1= 140 – 7q1 and p2 = 90 – 0.4 q2/2
20
TC = 20 + 2q + 3q2 where q= q1+ q2, what price will the firm charge in two markets to maximize
profit?
Solution:
1. a. fxx= ; fyy = and fxy = fyx =
b.fxx = 294(7x + 3y); fyy = 54(7x + 3y) and fxy=fyx= 126 (7x +3y)
c. fxx= 30x((x3+5y)4+6x2(x3 +5y)3);fyy=50(x3 +5y)3 and fxy= 300 (x3 +5y)3x2
d. fxy= 2ey fxx=0.
e. fxx= - ; fyy= 2 and fxy= = fyx
f. fxx= 6x e2y; fyy= 4x3e2y and fxy= 6x2e2y

2. Derivatives
a. f1 = 6x1x2 + 24x1 + 5x2 + 20; f2 = 3x12 + 5x1 + 1
b. f11 = 6x2 + 24; f22 = 0; f12 = 6x1+ 5; and f21 = 6x1 + 5. Note both cross partial derivatives
are equal, as they should be, according to the Young's Theorem.
3. Answers:
 Qd  Qd  Qd
a.  0.05;  10;   10 Ps
Y  Pc  Ps
b. 1.12
c. - 0.56
d. 0.16
4. Apply Young’s theorm
5. Since and are both greater than zero. Hence, the commodities X1 and X2 are
competitive.
6. p1= 3.2 and p2= 3.9; e11= -1.7; e21= 0.8; e22= -.2 and e12 = 0.5.
7. Answers:
21
1
y  K  2 40
a.  5   1
L  L  2
L
b. L = 16
c. L = 25; labor demand increases with wage decline.
1
y  K  2 50
d.  5   1 = 8; i.e., L = 39.
L  L  2
L
1 1
 
e. 2.5L 2 K 2
8. x=1 and y=2; π = 8
9. p1= 110.52 and p2 = 85.52 and q= 13.17
6.References:
1. K. Sydaster and P. Hammond, Mathematics for Economic Analysis, Person Educational
Asia, Delhi, 2002.
2. M. Hoy et.al, Mathematics for Economics, PHI Learning Private Limited, Delhi, Second
Edition, 2001.
3. J.E. Draper and J.S. Klingman, Mathematical Analysis Bussiness and Economic
Applications, Harper & Row Publishers, New York, 1967.
4. Rosser, Mike, Basic Mathematics for Economists Second Edition, London, 2003.
22
Homogeneous and Homothetic Function
DC-1
Semester-II
Lesson: Homogeneous and Homothetic Function
College/Department: P.G.D.A.V College, University of Delhi
1
Contents
2. Tools of Comparative Static Analysis
2.1 Chain Rule
2.1.a Chain rule with Multivariable Function
2.1.b Chain rule with ‘n’ variables
2.2 Directional derivative
2.3 Implicit Differentiation
2.2.a Implicit Differentiation with three or more variables
2.2.b Implicit functional theorem
2.3 Homogenous and Homothetic Functions
2.3.a Homogenous Functions
2.3.b Homothetic Functions
2.3.c Partial derivatives of homogenous functions
2.3.d Euler’s theorem
3. Exercise
4. References
2
1. Chain Rule of Differentiation

2. Implicit function Differentiation
3. Implicit function Theorem
4. Homogeneous and Homothetic Functions.
5. Euler’s Theorem
2. Tools of Comparative Static Analysis
In economic analysis, the theory represents certain association between the independent variables
and the dependent variables. It is harder to solve clearly by transmuting the equations to ones
that reveal the dependent/endogenous variables as functions of the independent/exogenous
variable of the given data. When there is change in exogenous variable then endogenous variable
also change, to determine this change; the method of implicit differentiation is applied. This
technique of finding rates of change of endogenous variables, as exogenous variables change, is
known in economic as comparative statistics.
2.1Chain Rule:
One of the most important techniques of differentiation is chain rule. The chain rule is a rule of
differentiating compositions of functions. Composition of function signifies the function of
another variable. These are functions of one or several variables in which the variables
themselves functions of the another basic variables.
If a function consists of two variables and both are function of common variable ‘t’, e.g.
y= f(x1 (t),x2 (t))
then according to chain rule, differentiation y w.r.t. ‘t’ is
= + .
Example: y=3x1 + 5x2 with x1=t2 and x2=4t3
Applying chain rule gives
=2t; =12t2
3
=3 and =5
= +
=3.(2t)+ 5.(12 t2)
=6t + 60t2
2.1.a Chain rule with Multivariable Function:
If x and y are both multivariable functions; i.e. x = x(u,v) and y=y(u,v); have first order partial
derivatives at the point (u,v) and suppose z = f (x, y) is differential at point (x(u,v); y(u,v)) then
f(x(u,v); y (u,v)) has first order partial derivatives at (u,v) given by:
= +
and
= +
Example:
;
Let z= where x(u,v) =
and y (u,v)=1/v, find and
= 2xy and = x2
=v1/2 u -1/2 and =

u1/2 v-1/2
=0 and = -1.v-1-1 = -v-2
= +
.
= (2xy) ( ) + x2.0
4
= . 2xy. .
Putting values of x and y.
=
. xy. .
= (uv)1/2. . ..
= eu.
= . + .
=(2xy ). . + x2 .
( )
= (xy. - )
= ( – )
= ( –
= (0)
=0
2.1.b Chain rule with ‘n’ variables:
The chain rule can also be extended to a number of variables, which is a function of other
variables:
Z=f (x1…….xn) with
X1= x1 (t1……..tm)
X2= x2(t1……….tm)
Xn= xn (t1………..tm) then
5
= . + . +…….+ .
2.2 Directional Derivatives
For a function z=f(x,y), the partial derivative with respect to x gives the rate of change of f in the
x0 direction and the partial derivative with respect to y gives the rate of change of f in the y0
direction. How do we compute the rate of change of f in an arbitrary direction? The rate of
change of a function of several variables in the direction u is called the directional derivative in
the direction u. Here u is assumed to be a unit vector.
If z=f (x,y), the partial derivatives f'1 (x,y) and f'2 (x,y) Choose a particular point (xo,yo) in the
domain. Any nonzero vector (h,k) is then a direction in which we can move away from (xo, yo)
in a straight line to points of the form.
(x,y) = (x(t)) = (xo+th, yo+tk)
Given the point (xo, yo) and the direction (h, k) ≠ (o,o), define the directional function g by
g(t) = f(xo+th, yo+tk) …..(1)
By using the chain rule, the derivative of this directional function can be calculated as
g'(t)=f'1 (x,y) + f'2 (x,y)
= f'1 (xo+th, yo+tk)h+f'2 (xo + th , yo + tk )k …….(2)
If t=0 then
g'(0)=f'1(xo, yo)h +f'2(xo,yo)k …….(3)
For the case when the vector (h,k) has length 1, the derivative of f in the direction (h,k) is called
the directional derivative of f in the direction (h,k) at (xo,yo). It is denoted by Dh,k f(x0,y0). Hence,
the directional derivative of f(x,y) at (x0,y0) in the direction of unit vector (h,k) (where h2+k2=1)
is
Dh,k f(x0,y0) = f1(x0,y0) h+ f2(x0+y0)k
6
Any move from (x0,y0) to (h,k) changes the value of f by approximately Dh,k f(x0,y0). The vector
(f1(x0,y0), f2(x0,y0)) is called as gradient of the function f (x,y) at (x 0,y0). Therefore, it is the
scalar product of gradient with vector (h,k).
Now, differentiating (2) with respect to t, we get second derivative of the directional function g.
i.e,
gˈˈ(t)= f1ˈ(x,y)h + f2ˈ(x,y) (4)
where x= x0 +th, and y= y0 +tk. Again, applying the chain rule, the above equation becomes:
f1ˈ(x,y)= (x,y) + (x,y) = (x,y)h+ (x,y)k
f1ˈ(x,y)= (x,y) + (x,y) = (x,y)h+ (x,y)k
Suppose, f12ˈˈ= f21ˈˈ,then equation (4) becomes:
gˈˈ(t) = (x,y)h2 + 2 (x,y).hk + (x,y) k2
again x= x+ t.h and y= y+t.k. Assuming t=0 and (h,k) has length 1, then above equation
becomes:
D2f(x,y)= (x,y)h2 + 2 (x,y).hk + (x,y) k2
Example: If f(x,y) = xy.
Compute the first and second directional derivatives of f at (xo yo) in the directions:
(a) (h,k)=(1 , 1/ ) and
(b) (h,k)= (1 , -1/ ).
Solution: We have
f'1 (x,y)=y, f'2 (x,y)=x, f'11 (x.y)=0.
f'12 (x,y)= f'21 (x,y)=1, f'22 (x.y)=0.
Thus, if (h,k)=(1/ .1/ ), then
7
Dh.k f(xo.yo) = y0 + x0 = (xo+yo)
and
f(xo.yo)= 0( )2+ 2 +0( )2 =1
If (h,k) = ( , - ); then
Dh.k f(xo,yo) = y0 + x0 = (yo- xo)
and
f(xo.yo)= = -1
2.3 Implicit Differentiation
An application of chain rule to determine the derivative of a function defined implicit.
Suppose that x and y are related to each other with the relation; F(x,y)=0 where y = f(x) is a
differentiable function of x. Find by using chain rule method.
Consider a function:
Z= f (x,y) = F (x, f(x))
then
=FX (x,y) + FY (x,y)
Because Z=F(x,y)=0 for all x.
Fx(x,y). +FY (x,y) =0
Now, if Fy (x,y) ≠0 and =1 then
=- where Fx (x,y)=∂z/∂x and Fy(x,y)=∂z/∂y.
Example:
8
Find dy/dx given
y3 + y2-5y-x2+4=0.
Solution:
Define a function
F(x,y)=y3+y2-5y-x2+4.
Fx (x,y)= =-2x.
Fy(x,y)=(∂Fy(x,y))/∂y=3y2+2y-5
2.3.a Implicit Differentiation of three or more variables:
Let’s assume that there is an implicit function consists of three variables; i.e, F (x,y,v) = 0.
The above equation becomes .dx+ .dy+ .dv=0.
In order to get derivative hold one of the variable constant. Suppose v is constant then dv=0 and
0= ∂z/∂x.dx+ .dy., rearrange the equation we get as in two
variable case, because one variable was constant, so this difference is partial. So notations have
to change. Therefore,
∂y/∂x=(-∂z/∂x)/(∂z/∂y)
∂y/∂x=(-fx)/fy.
In general, the partial derivations of an implicit function F(x1,x2,….xn,z) are given by
∂z/∂xi=-(∂F(∂xi))/(∂F(∂z)) (i=1,2…n); asumming ∂F/∂z≠0
Example:
x- 2y- 3z+ z2= -2.

9
Let F (x,y,z) = x- 2y- 3z+ z2and c= -2.
Fx= ∂v/∂x=1, Fz = 2z -3
Fy= -2.
∂z/∂x= zˈx= - (-FX)/(-FZ) and zˈy=(-Fˈy)/ (-Fˈz)
= [Fˈz ≠0, so assuming that z≠3/2]
= = .
Example: Given the demand function
Q2= 4850 – 5P2 +1.5P1 + 0.1 Y
Where Y= 10,000, P2= 200 and P1= 100. Find the income elasticity of demand and cross
elasticity of demand for first commodity.
Solution:
(a) Income elasticity of demand is given by:
ey= /
= ( )
Given the demand function: Q2 = 4850 – 5P2 +1.5P1 + 0.1 Y
= 0.1 ……..(1)
Q2= 4850- 5(200) + 1.5(100) +0.1 (10000) = 5000……..(2)
Putting the values of 1 and 2, hence;
ey= 0.1(10000/5000) = 0.2
Since the value of ey<1, therefore the good is income elastic.
(b) Cross elasticity of demand is given by:
10
ec = ( )
given Q2= 4850 – 5P2 +1.5P1 + 0.1 Y
= 1.5
ec= 1.5 ( ) = 0.03
2.3.b Implicit functional theorem (for 2 variables): Let F(x,y)=0 be an implicit function with
continuous first derivatives, which is satisfied at some point, (x0, y0) and is defined in some
neighborhood of this point. If Fy≠0 at this point, then there is a function y=f(x) defined in some
neighborhood of x=xo corresponding to the relationship defined by F(x,y)=0 such that:
(i) yo=f(xo) and

(ii) fˈ(xo)=-Fx/Fy.
Statement: Let F (x1, x2……xn,y)=0 be an implicit function with continuous first derivatives
which is satisfied at some point, (x1, x2…….xn, y) is defined at some neighborhood of this point.
If Fy≠0 at this point, then there is a function y=f(x1, x2…..xn) defined in some neighborhood of
x=xo=(x1, x2…..xn) such that
(i) yo=f(xo)
(ii) fi(xo)=-Fxi/Fy.
Example: The Cobb-Douglas production function: 50 K0.3 L0.7 = Q, where Q is a given level of
output, K is the amount of capital and L is the amount of labor. The isoquant associated with the
function reflects the levels of capital and labor that yield a constant level of output.
a. Use the Implicit Function Theorem to derive an equation for the slope of an isoquant
associated with the production function.
b. When K = 6 and L = 2, what is the slope of a line tangent to this isoquant? What is the
slope of the line when K = 3 and L = 14?
c. Find the MRTS for both examples in part (b).
Solution: Given the production function 50 K0.3 L0.7 = Q
11
a. Slope of Isoquant = =- =-
=-
b. when K=6 and L=2 then slope becomes:
Slope of Isoquant= - = . = -7
When k=3 and L=14 then
Slope = - = .
c.i) MRTS= =7
ii) MRTS=
Note: The marginal rate of technical substitution (MRTS) is the rate at which the two production
inputs can be substituted if output is held constant. It is the absolute value of the slope of the
isoquant.
Example: Compute σk,L = for the Cobb-Douglas function F(K,L) = AKaLb
Solution: The marginal rate of substitution between K and L is
σk,L = = =
12
Example: The implicit function U  AB shows what combinations of apples (A) and bananas
(B) provide the levels of utility U. Find the derivative of the implicit function to determine the
MRS of apples for bananas (MRSAB).
Solution: Given the utility function:
U=
Slope of IC = =-
MRS A,B=
Note: The absolute value of the slope of the indifference curve is the marginal rate of substitution
(MRS), which measures the rate at which one good can be substituted for another, while
maintaining the same level of utility.
2.3 Homogeneous and Homothetic Functions

2.3.a. Homogeneous Functions
A function is called as a homogeneous function of any degree ‘n’ if; when each of its elements is
multiplied by any number t > 0; then the value of the function is multiplied by tn. For instance, a
function is homogeneous of degree 1 if, when all its elements are multiplied by any number t > 0,
the value of the function is multiplied by the same number t.
i.e, f(x1; :::; xn) is homogenous of degree k if for all t > 0
f(tx1; :::; txn) = tkf(x1; :::; xn)
To explain the concept of homogenous function, take an example, Q=f(k,L), where K,L and Q
are variables. When independent variables (K,L) changes, there must be change in the dependent
variable. In other words, if K and L are both increases by some factor ‘t’ then Q also changes by
some factor. If t=2 (a doubling of K and L), then Q also doubles, then the function is
homogenous of degree 1.
13
Effect on Q when K and L are Economist view Mathematician view

aboth doubled
Q is exactly doubled CRTS Function is homogenous of
degree=1
Q is more than doubled IRTS Function is homogenous of
degree>1
Q is less than doubled DRTS Function is homogenous of
degree<1
2.2.b Homothetic Functions:
If f(x1, x2,…..,xn) is a function of n variables defined in domain D. then f is called homothetic if
X,y € K, f(x)=f(y) , t>0→ F)tx)=f(ty)
If utility function is
u(x; y) = xy,
is a homogenous function of degree 2. Then the monotonic transformations
g1(z) = z + 1;
g2(z) = z2 + z;
g3(z) = log z
generate the following homothetic (but not homogenous) functions
v1(x; y) = xy + 1;
v2(x; y) = x2y2 + xy;
v3(x; y) = log x + log y:
Example
For the function f (x1, x2) = Ax1ax2b, test the homogeneity of function.
Solution:
f (tx1, tx2) = A(tx1)a(tx2)b
= Ata+bx1ax2b
= ta+b f (x1, x2),
so that f is homogeneous of degree a + b.
14
Example: Given the function, check whether function is homogeneous functions or not
Multiplying by some factor α
f( αx, αy,αz) = (αx)5 (αy)2 (αz)3
= α10 x5 y2 z3
= α10 f( αx, αy,αz)
Hence, this function is homogenous of degree 10.
2.3.c Partial derivatives of homogeneous functions
If f be a differentiable function of n variables that is homogeneous of degree k. Then each of its

partial derivatives f 'i (for i = 1, ..., n) is homogeneous of degree k − 1.
The homogeneity of f means that
f (tx1, ..., txn) = tk f (x1, ..., xn) for all (x1, ..., xn) and all t > 0.
Now differentiate both sides of this equation with respect to xi, to get
t f 'i(tx1, ..., txn) = tk f 'i(x1, ..., xn),
and then divide both sides by t to get

f 'i(tx1, ..., txn) = tk−1 f 'i(x1, ..., xn),
so that f 'i is homogeneous of degree k − 1.
1.2.4 Euler’s theorem
If the function z = f(x,y) is a homogeneous of degree ‘n’ then according to Euler’s theorem:
x. + y. = n.f(x,y)
If Z=f(x1,x2,x3,…..,xn), then according to this theoem:
x1. + x 2. + x 3. +…………+ xn. = nf(x1,x2,x3,….,xn)
15
Example:
Use Euler’s theorem to determine the degree of homogeneity of the following functions
1). f(x,y)= 2x2+xy-y2
=fx(x,y)= 4x+y
=fy(x,y)= x-2y
According to Euler’s theorem:
x +y = nf(x,y)
= x(4x+y)+y(x-2y)
=4x2+xy+xy-2y2
=4x2+2xy-2y2
=2(2x2+xy-y2 )
The degree of homogeneity is 2.
Example:
Use Euler’s theorem to determine the degree of homogeneity of the following function
f(L,K)=ALαKβ
= αALα-1 β
= βALα β-1
By Euler’s theorem
L +K = nf(L,K)
= L.(αALα-1 Kβ) + K.(βALαKβ-1)
=αALα Kβ+ βALαKβ
16
=(α+β) (ALαK β)
=(α+β) f(L,K)
The degree of homogeneity is α+β
Example: Suppose that f (x1, ..., xn) is homogeneous of degree r. Show that each of the
following functions h(x1, ..., xn) is homogeneous, and find the degree of homogeneity.
a. h(x1, ..., xn) = f (x1m, ..., xnm) for some number m.

b. h(x1, ..., xn) = [ f (x1, ..., xn)]p for some number p.
Solution:
a. We know that f (x1, ..., xn) is homogeneous of degree r, therefore

f ((tx1)m, ..., (txn)m) = f (tmx1m, ..., tmxnm)
= (tm)r f (x1m, ..., xnm)
= tmrh(x1, ..., xn),
hence h is homogeneous of degree mr.
b. We have h(tx1, ..., txn) = [ f (tx1, ..., txn)]p
= [tr f (x1, ..., xn)]p
= trp[ f (x1, ..., xn)]p
= trph(x1, ..., xn),
Therefore, h is homogeneous of degree rp.
Example: Solve the following:
a. Is the function (x3 − y3)/(x1/2 + y1/2) homogeneous of any degree? (If so, which
degree?)
b. Is the function x3y3 + x1/2 homogeneous of any degree? (If so, which degree?)
c. A consumer's (differentiable) demand function for some good is f (p1, ..., pn, w),
where pi is the price of the ith good, and w is the consumer's wealth. This function
f is homogeneous of degree 0. Is there any necessary relationship between
∑i=1n(pi f i'(p1, ..., pn, w)) and w f n+1'(p1, ..., pn, w)?
17
Solution: a. Given the function (x3 − y3)/(x1/2 + y1/2)= ((tx)3 − (ty)3)/((tx)1/2 + (ty)1/2)
= t5/2(x3 − y3)/(x1/2 + y1/2)
Therefore, the function is homogenous of degree 5/2.
b .Given the function x3y3 + x1/2 = (tx)3(ty)3 + (tx)1/2 = tkx3y3 + x1/2 .
Hence, function is not homogeneous of any degree.
Suppose, to the contrary, it is homogeneous of degree k. Then for some value of k we

have (tx)3(ty)3 + (tx)1/2 = tkx3y3 + x1/2 for all t and all (x, y). In particular, taking t = 4
we have 4096x3y3 + 2x1/2 = 4k(x3y3 + x1/2) for all (x,y), and hence 2 = 4k (taking (x, y)
= (1, 0)) and 4098 = 2(4k) (taking (x, y) = (1, 1)), which are inconsistent.
c .Given the function, f (p1, ..., pn, w); which is homogeneous of degree 0
then according to Euler's theorem we have
∑i=1npi f i'(p1, ..., pn, w) + w f 'n+1(p1, ..., pn, w) = 0. (Note that f has n + 1 arguments.)
Example: Consider the production function Q=AKαLβ.

a. Using Euler's Theorem, prove that this production function exhibits constant returns to
scale when + = 1.
b. What condition on + is necessary for increasing returns to scale? For decreasing
returns to scale?
Solution: Given Cobb-Douglas Production Funtion:
Q=AKαLβ
= α A Kα-1Lβ
= β A KαLβ-1
18
According to Euler’s theorm:
K +L = K.α A Kα-1Lβ + L. β A KαLβ-1
= (α+β) (A KαLβ)
= (α+β). Q
a. If (+) = 1, then K +L = Q . If the value of K and L doubled, i.e., 2K and 2L, then
output also doubles; then there is a constant return to scale in the production.
b. If (+) is not equal to 1, then K +L =(α+β). Q. Doubling the value of K and L in
the right hand side adds up to 2(+)Q. If (+) > 1, the output more than doubles, i.e,
there are increasing returns to scale. If (+) < 1, the output is less than double, or the
decreasing returns to scale in production.
Note: A proportional increase in all the values of inputs in a production function increases the
scale of production. If there are constant returns to scale, then output will increase equi-
proportionally to the increase in all inputs. If there are increasing returns to scale, an increase in
all inputs will lead to a more than proportionate increase in output. If there are decreasing
returns to scale, then output will increase less than proportionately with an increase in all inputs.
Example: Consider the following Cobb-Douglas production function, which is homogeneous of
degree 1 in capital and labor Q  50K 0.4 L0.6 . The value of the output (Q) includes the payment
made to the labor, i.e., the wages paid to the labor (wL), which is equal to .L in a competitive
labor market. Also, the value of the output includes the payment made to the capital suppliers
(rK), which is equal to .K. Show that the sum of the total factor payments (wL + rK) equals
the value of the output, i.e., wL + rK = Q, such that wL + rK =  Q + (1-) Q, where  = 0.6.
Solution: Given the production function
Q= 50 K0.4 L0.6
= 0.4(50 K0.4-1 L0.6)
19
= 0.6(50 K0.4 L0.6-1)
K +L = K.0.4(50 K0.4-1 L0.6)+ L.0.6(50 K0.4 L0.6-1)
Given, (Kr)= K and Lw= L
Therefore,
rK+wl= K +L = K.0.4(50 K0.4-1 L0.6)+ L.0.6(50 K0.4 L0.6-1)
= 0.4 Q + 0.6 Q= (1-α).Q + αQ
= αQ+ (1-α) Q (given α=0.6)
Hence proved.
Example: Given the following production function; find out the elasticity of substitution:
z = A(aK−ρ + bL−ρ)−m/ρ (where A,b,and ρ constants) and ρ ≠ 0 with ρ> -1.
Solution: partial differentiation the function z = A(aK−ρ + bL−ρ)−m/ρ with respect to L and K
respectively,
zˈL= A(-m/ρ) (aK-ρ + bL-ρ)(-m/ρ)-1b(-ρ)L-ρ-1
zˈK= A(-m/ρ) (a K-ρ + bL-ρ)(-m/ρ)-1a(-ρ)K-ρ-1
therefore,
MRTSK,L = RK,L=
= (RK,L)1/( ρ+1)
Hence,
σK,L= ElRk,L( = .
Example: Without solving the equation, show that 2x2+5xy+y2=19 defines an implicit function
20
y(x) for which y(2)=1, and find dy/dx when x=2. Express the answer in geometrical terms.
Solution: Putting x=2 and y=1 in given function 2x2+5xy+y2=19, we see that equation satisfied,
2(2)2 + 5(2)(1)+ (1)2 =19.
Using implicit differentation, we get
=- = - 13/12
When (x,y)=(2,1). In geometrical terms, this means that the slope of the contour 2x 2+5xy+y2=c
which passes through point (2,1) is -13/12 at that point.
Example: The function g is defined by
g(x, y) = f (x, y) − aln(x + y),
where a is a constant and f satisfies the condition
x f 'x(x,y) + y f 'y(x,y) = a for all (x, y).
Show that g is homogeneous of degree 0.
Solution: Given g(x, y) = f (x, y) − aln(x + y),
Therefore, according to Euler’s theorm,
xg'x(x,y) + yg'y(x,y) = x( f 'x(x,y) − a/[x + y]) + y( f 'y(x,y) − a/[x + y])
= xf 'x(x,y) − ax/[x + y] + y f 'y(x,y) − ay/[x + y])
= xf 'x(x,y) + y f 'y(x,y) − ax/[x + y] − ay/[x + y]
Given, x f 'x(x,y) + y f 'y(x,y) = a , therefore the equation becomes…
xg'x(x,y) + yg'y(x,y) = a- a(x/[x + y] + y/[x + y])
21
= a – a(1)
=0
for all (x,y). Thus by Euler's theorem g is homogeneous of degree 0.
Example: The twice-differentiable function f (x, y) is homogeneous of degree k, and its second
derivatives are continuous. Show that
x2 f "11(x, y) + 2xy f "12(x, y) + y2 f "22(x, y) = k(k − 1) f (x, y) for all (x, y).
Solution: We know that f is homogeneous of degree k which means that f '1 and f '2 are
homogeneous of degree k − 1. Thus by Euler's theorem applied to f '1 and to f '2 we have,
x f "11(x, y) + y f "12(x, y) = (k − 1) f '1(x, y) for all (x, y)

x f "21(x, y) + y f "22(x, y) = (k − 1) f '2(x, y) for all (x, y).
Multiply first equation by x times and second equation by and y times.
x2 f "11(x, y) +x y f "12(x, y) = (k − 1) x f '1(x, y) for all (x, y)

x y f "21(x, y) + y2 f "22(x, y) = (k − 1) y f '2(x, y) for all (x, y).
Now the sum up the above two equations
x2 f "11(x, y) + 2xy f "12(x, y) + y2 f "22(x, y) = (k − 1)[x f '1(x, y) + y f '2(x, y)] for all (x, y)
(given that f "12(x, y) = f "21(x, y), by Young's theorem).
Finally, the term in brackets on the right-hand side of this equation is equal to k f (x,y) by
Euler's theorem, yielding the required result.
Example: A firm uses two inputs to produce a single output. Its production function f is
homogeneous of degree 1. An implication of the homogeneity of f , which you are not asked to
prove, is that the partial derivatives f 'x and f 'y with respect to the two inputs are homogeneous
22
of degree zero. Use Euler's theorem to find an expression for the cross partial derivative
f "xy(x, y) in terms of x, y, and f "xx(x, y).
Solution: Given f 'x is homogeneous of degree 0, therefore according to Euler’s theorem an

expression for the cross partial derivative is :
x f "xx(x, y) + y f "xy(x, y) = 0,
so that f "xy(x, y) = −(x/y) f "xx(x, y).
3. Exercise:
1. Given Q=440-8P +0.05 Y, where P=15 and Y=12,000. Find the income and price
elasticity of demand.
2. Given Q1= 110-P1+0.75 P2-0.25 P3+0.0075 Y. At P1=10, P2=20, P3=40, and Y=10,000,
Q1=170. Find the different cross elasticities of demand.
3. Determine whether each function is homogeneous and, if so, of what degree.

f(x,y) =
f(x,y,w)= 3x2y -
4.Test the degree of homogeneity of a function given below:
a ) z = 10x + 5y
b) z = x2 + 5xy + 12 y2
c) z= x0.3 + y0.4
d) z = 10 x5 + 10x2y3 +y5
5. Assume the demand for sugar is a function of income (Y), the price of sugar (Ps) and the
price of saccharine (Pc), a sugar substitute, as follows:
Qd  f (Y , Pc , Ps )  0.05Y  10 Pc  5Ps2 .
23
a. Find the partial derivatives of this demand function.

  Qd Y 
b. Find the elasticity of demand with respect to income    when Y = 10,000, Ps =
  Y Qd 
5 and Pc = 7.
  Qd Ps 
c. Find the own-price elasticity of demand    when Y = 10,000, Ps = 5 and Pc = 7.
  Ps Qd 
  Qd Pc 
d. Find the cross-price elasticity of demand    when Y = 10,000, Ps = 5 and Pc = 7.
  Pc Qd 
6. Consider the production function y = f(x1,x2) = x1 x2 defined over the domain x1 > 0 and
x2 > 0. Also, consider the functions g(y) = ln(y) and j(y) = y2.
a. Is f(x1,x2) a homogeneous function? If so, of what degree?

b. Is g(y) a homothetic function? Is g(y) a homogeneous function in the arguments x1 and
x2? If so, what is its degree?
c. Is j(y) a homothetic function? Is j(y) a homogeneous function in the arguments x 1 and
x2? If so, what is its degree?
7. Consider the production function y= f(x1,x2)= .
a. Determine whether the production function is homogeneous. If so, of what degree?
b. Find out the partial derivatives of the function and show that they are also homogeneous.
c. Show that x1f1(sx1, sx2)+ x2f2(sx1, sx2) = ksk-1f(x1,x2).
Solution:
1. Income elasticity of demand=0.652 and price elasticity of demand= -0.13.

2. ep1 = -0.0889; ep2 =0.133 and ep3 = -0.0889.
3.a. homogeneous of degree 1
b.not homogeneous
4. a. homogenous of degree 1
24
b. homogenous of degree 2
c. not homogenous
5. Answers:
 Qd  Qd  Qd
a.  0.05;  10;   10 Ps
Y  Pc  Ps
b. 1.12
c. - 0.56
d. 0.16
5. Answers:
a. homogeneous of degree 2
b. homothetic; not homogeneous in x1 and x2
c. homothetic; homogeneous of degree 4 in x1 and x2
6. Answers:
d. homogeneous of degree 2
e. homothetic; not homogeneous in x1 and x2
f. homothetic; homogeneous of degree 4 in x1 and x2
7. Answers:
a. homogeneous of degree 7/12 (i.e., k=1/12)
b. f1(x1,x2) = f1(x1,x2) is homogeneous of degree k-1, i.e., -5/12;
and take the partial derivative of the production function with respect to x2 and show that
f2(x1,x2) is homogeneous of degree -5/12.
c. Apply Euler's Theorem.
4. References:
1. K. Sydaster and P. Hammond, Mathematics for Economic Analysis, Person Educational
Asia, Delhi, 2002.
2. M. Hoy et.al, Mathematics for Economics, PHI Learning Private Limited, Delhi, Second
Edition, 2001.
3. J.E. Draper and J.S. Klingman, Mathematical Analysis Bussiness and Economic
Applications, Harper & Row Publishers, New York, 1967.
4. Rosser, Mike, Basic Mathematics for Economists Second Edition, London, 2003.
25
DC-1
SEM-II
Lesson: CONVEXITY AND CONCAVITY OF

FUNCTIONS
Lesson Developer: S.K. TANEJA
College: Ram Lal Anand College
University of Delhi
Institute of Life Long Learning, University of Delhi 1

CONVEXITY AND CONCAVITY OF FUNCTIONS
Table of Contents
2. Introduction
3. Derivative Test for Concavity and Convexity
4. Second Derivative and Concavity and Convexity
4.1 Total Differential Method
4.2 Definitions
4.3 Use of Hession Matrix for the determination of Convexity and Concavity
4.4 Graphical Representation

4.5 Assumption
4.6 Theorem
5. Second derivative test for concavity and convexity.
6. Quasi-concave and Quasi-convex Functions
7. Quasi-convex function
8. Properties of Quasi-concave and Quasi-convex functions
9. Functions of multiple variables
10. Differentiable function
11. Exercises
12. References

1. Explain concept of concavity and convexity.

2. Explain concept of quasi- concavity and quasi-convexity.
3. Second Derivative Test for Concavity and Convexity.
4. Use of Bordered Hessian Determinant
2. Introduction
A function f is concave if and only if any pair of distinct point p and R in the domain of f
and 0   1
f ( p  (1 –  ) R )  f ( p)  (1 –  ) f ( R )
0 0 1 1
Where p = ( x1 , x2 ) and R = ( x1 , x2 )
The definition can be extended to strict concavity by changing the weak inequality ≥ to
the strict inequality >.
A function f is convex if and only if any pair of distinct points p and R in domain of f and
for 0<θ<1
f ( p  (1 –  ) R )   f ( p)  (1 –  ) f ( R )
The right hand side is the height of line segment and the left hand side is the height of
the arc AB.
Figure 1

Till now we have been discussing concavity and convexity of functions of one variable
only. The conditions for concavity and convexity, strict and non-strict can be defined for
functions of many variables. We shall discuss the concept of concavity and convexity for
a two variable function.
ƶ = f(x1, x2)
The function f(x,y) is concave (convex) if and only if for any pair of distinct points A and
B on its graph (a-surface) the line segment lies either on or below (above) the surface
except at point A and B. Strict concavity requires the line segment AB lies below the arc
AB. Imagine a dome-shaped surface. The surface of convex function typically be bowl-
shaped. For non-strictly concave and convex function the line segment AB is allowed to
lie on the surface itself, some portion of the surface, or even the entire surface may be
flat rather than curved
Figure 2
3. Derivative Test for Concavity and Convexity
In the case of functions of two or more than two variable, it becomes difficult to use
diagrammatic method or algebraic method to determine the concavity or convexity of
function. The functions are such that they require a lot of algebraic manupulation to use
the algebraic formula. A way out is to use the derivatives if the functions is
differentiable.
A differentiable function f(x) = f(x1, x2,..., xn) is concave if and only if for any given point
0 2 0 1 1 1
p = ( x1 , x2 ,....., xn ) and any other point R = ( x1 , x2 ,...., x n ) in convex domain
n
f ( R )  f ( p)   f i ( f ) ( R – p)
L 1
In the case of function of two variables this can be written as :

0 0 1 1
given P = ( x1 , x2 ) and R = ( x1 , x2 )
f ( R )  f (p)  f1 (x i ) (x 02 – x1 )  f 2 (x 20 ) (x12 – x11 )
In the case of convex function the inequality will be reversed.
Geometrically it means that for a concave function the tangent plane on point p on the
graph of the function lies initially above the graph of the function.
In the case of a convex function graph of the function lies strictly above all the tangent
planes or the hyper planes, except the point of tangency.
Example
2 2
ƶ = x1  x2
The function is convex if for all X = (x1, x2) and Y = (y1, y2)
f(Y) – f(X) ≥  f (X) (y

i i – x i )  (y12  y22 ) – ( x12  x22 ) ≥ 2x1 (y1–x1) + 2x2 (y2–x2)
f f
(where  2 x1 and  2 x2 ) = 2 x1 y1 – 2 x12  2 x2 y2 – 2 x22
x1 x2
shifting the right hand side to left hand side we get.
y12  y 2 – x12 – x22 – 2 x1 y1  2 x12  2 x2 y2 – 2 x 22 ≥ 0
 y12  x12  2 x1 y1  y22  x22 – 2 x2 y2 ≥ 0
 ( y1 – x1 ) 2  ( y2 – x2 ) 2 ≥ 0
The expression in the brackets will remain positive whatever the value of (x1, x2) and
(y1, y2). This proves that the function is convex
4. Second Derivative and Concavity and Convexity
Till now we have discussed about curvature properties of the function by using algebra
or first derivative concavity and convexity of a function is usually discussed using the
second derivative. The second derivative shows how the function represented by the first
derivative changes. In the case of function of one variable we saw that if f''>0 is convex
which means that for f'>0 the function increases more rapidly as x increases while for
f'<0 the function values full less quickly. For f''<0 the function is concave which means
that for f'>0 the function value increases less quickly as x increases while for f'<0 the
function value falls more quickly.
We cannot use the method of determining concavity and convexity for function of two
variables (or n variables). Second partial derivatives cannot be used directly because
there are infinite number of paths that one can take from same point.

Example
2 2
ƶ = x1  x2 – 20 x1 x2
f1 = 2x1 – 20x2 and f11 = 2
f2 = 2x2 – 20x2 and f22 = 2 and f12 = –20
f11 and f22 are positive, the function is not strictly convex in all directions. Cross partial
derivative also plays a role in determining the curvature of the function.
4.1 Total Differential Method
In order to determine the concavity (convexity) of the functions of two variables (this
approach can be extended to n-variables also) we shall use the method of total
differential.
Let y = f(x)
the first order differential at point x = x0 is dy = f (x0) dx
dy is a function of both x and dx. Let us regard dx as a given constant. dx is infinitely

small change in x. Now we find the total differential of dy which we can write
d (dy )
d2y = d(dy) = dx
dx
d [ f '( x) dx]
 dx
dx
d (dx )
 f ''( x ) dx dx  f '( x ) dx
dx
= f''(x) dx2 to (since dx is constant)
= f''(x) dx2
This is called the second total differential of f(x). Since the term dx2 = (dx)2, it is strictly
positive for any value of dx ≠ 0. It follows that d2y has some sign as f''(x). Therefore the
determination of convexity and concavity which relies on the sign of f''(x) can be
presented using the sign of d2y. A function is convex if f''(x) ≥ 0 and concave if f''(x) ≤ 0
then
d2y = f''(x) dx2 ≥ 0 for convex function
d2y = f''(x) dx2 ≤ 0 for concave functions
The same conditions relating to the sign of d2y to concavity/convexity apply to functions
of n-variables. Here we shall explain this method for two variables.
y = f(x1, x2)

Total differential
f f
dy  dx1  dx2
x1 x2
= f1 dx1 + f2 dx2
The second order total differential is the total differential of dy
(dy) (dy )
d (dy )  d 2 y  dx1  dx2
x1 x2
 ( f1dx1  f 2 dx2 )  ( f1dx1  f 2 dx2 )

 dx1  dx2
x1 x2
2 2
= f11 dx1  f 21 dx1 dx2  f12 dx1 dx2  f 22 dx2
2 2
= f11 dx  2 f12 dx1 dx2  f 22 dx2 ` (f12 = f21)
The expression makes it clear that d2y depends on cross partial derivative f12 as well as
f11 and f22.
A function y = f(x1, x2) is twice continuously differentiable. If d2y>0 whenever at least

one of the d x1 or dx2 is non-zero is convex. If d2y<0 then the function is concave
4.2 Definations
Def: A twice continuously differentiable function y = f(x1, x2) is concave if and only if,
d2y is everywhere negative semi definite
f11 ≤ 0, f12 ≤ 0, f11f22 – f12 ≥ 0
Def : A twice continuously differentiable function y = f(x1, x2) is convex if and only if d2y
is everywhere positive semi definite.
If the second order total differential is satisfies the condition d2y ≶ 0 then the function is
strictly concave/convex.
The method of determining the sign of d2y directly can involve a lot of algebraic
manipulation even when the function is function of two variables. In an earlier topic
dealing with maxima and minima we have used quadratic forms and their properties to
determine maxima-minima. Here also we can use the same method to determine the
sign of d2y.
d 2 y  f11 dx12  2 f12 dx1 dx2  f 222 dx22

We can write the right hand side as
f f12   dx1 
[dx1 dx2 ]  11  ...............
 f 21 f 22   dx2 
We known that f12 = f21 from young's theorem. It follows that 2×2 matrix is symmetric.
This matrix whose elements are second order partial derivatives and cross partial
derivatives is called the Hession matrix and is denoted by H. Hession matrix can be used
to determine the concavity and convexity of the function.
4.3 Use of Hession Matrix for the determination of Convexity and Concavity
Def: For any function y = f(x1, x2, ...., xn) = f(X) where X............. which is twice
diffeerntiable with Hession H, the function f is strictly concave on Rn iff H is negative
definite for all X in Rn, that is
Let dx = [dx1 dx2]
Then
d2y = dxT H dX
d2y = dxT H dx < 0
The function f is strictly convex on Rn if and only if H is positive definite for all x € Rn,
that is
d2y = dXT H dx > 0.
The Hession H is positive definite if and only if all the leading principal minors of Matrix H
are positive.
 f11 f12 f13 

 
H = f 21 f 22 f 23 

f f 32 f33 
 31
For example if y = f(x1, x2, x3) the leading principal minors are
f11 f12
|H1| = |f1|, |H2| = |H3| = |H| = |f11 f12 f13|
f 21 f 22
If |H1|<0, |H2|>0 |H3|>0 the f(x1, x2, x3) is strictly convex (the Hession
is positive definite.
If |H1|<0, |H2|>0, |H3|<0
then H is negative definite, and the f(x1, x2, x3) is strictly concave.
We now revise the condition for concavity and convexity.

Suppose H is the Hession matrix associated with twice differentiable function y = f(X) =
f(x1, x2, ...., xn) x ∈ Rn then
H is positive definite on Rn if and only if its leading principle minors are positive |H1|>0,
|H2|>0, .... |Hn| = |H|>0 for X ∈ Rn
This means that d2y>0 and so f is strictly concave.
H is negative definite on Rn if and only if its leading principle minors alternate in sign
begining with a negative value for |H1|.
>0 n is even
|H1|<0 |H2|>0, ... |Hn| = |H|
<0 n is odd
This means that d2y<0 and f is strictly concave.
Note : A leading principle minor of order r of Hession matrix is found by suppressing the
last n–r rows and columns.
Example. In the case of a 3×3 matrix |H1| is found by suppressing 2nd and 3rd rows and
columns.
|H1| = f11
A leading principle minor of order 2 |H2| is found by suppressing the third row and third
column.
So far we have given the conditions for strict concavity and strict convexity. There are
functions which are not strictly concave/convex. They are concave or convex.
Example
A twice differentiable function y = f(x1, x2...xn) is concave if and only if d2y is everywhere
negative semidefinite.
In terms of Hession matrix this means H is negative semi definite on Rn if and only if all
its principle minors alternate in sign begining with negative or zero value for k=1 (HK).
|Hx1| ≤ 0, |HX2| ≥ 0 ... |HXn| = |H| { ≥ n is even

≤ n is odd
A twice differentiable function y = f(x1, x2, ... xn) is convex if and only if d2y is
everywhere positive semi-definite.
In terms of H this means all its principal minors are positive or zero
|HX1| ≥ 0 |HX2| ≥ 0..........., |HXn| = |H| ≥ 0
Note : Principal minors of order are found by suppressing n-k rows and columns of H.

 f11 f12 f13 
 
Example H = f 21 f 22 f 23 

f f 32 f33 
 31
|H1| = f11, H22, H33
f11 f12 f13

|HX2| = f 21 f 22 f 23
f31 f 32 f33
For a 3×3 Hession matrix there are seven principal minor.
For an n×n matrix there are 2n–1 principal minor.
In the case of 2×2 matrix there are three principal minors
f11 f12
|HX1| f11, f12 |HX2| =
f 21 f 22
For concave function
f11 f12
f11 ≤ 0, f22 ≤ 0, ≥0
f 21 f 22
4.4 Graphical Representation
Let us draw a straight line
If we choose any two points b and d on the line and connect then by a line. The line
connecting the points b and d also lies on the straight line which we drew in the
beginning. All the point on the line connecting b and d lie on the original line.
Figure 3(a) Figure 3(b)
Now look at the circle choose any two points on or in the circle and connect them by a
straight line x1x2. This straight line also lies within the circle.

Now consider the Fig. 3(b). In this case if we connect the two points x1 and x2 by a
straight line we find that there are many points on the x1x2 which do not lie within the
figure shown above.
We are now in a position to define the property of straight line and the circle shown
above.
A set is convex if the line joining any two points of the set lies entirely within the set.
The straight line and the circle which includes the area within the circle is an example of
convex set.
Figure 4
In fig. 4, we have drawn the first quadrant of the eucledium space. If we take two points
a1 and a2 in the figure and connect them by a straight line then the entire line lies within
1 2
the quadrant. For example if we look at the point ƶ = a1  a2 the point ƶ lies on the
3 3
straight line connecting the two points a1 and a2. Any point on the straight line
connecting points a1 and point a2 can be expressed as :
a1 + (1–)a2 where 0 ≤ ≤ 1 [or ∈ [0, 1]]
Such a point is called a convex combination of pair of points a1 and a2
Def: A set S is convex if for every pair of points x1 ∈ S and x2 ∈ S, point x̄ = x1 + (1–
)x2is also an element of , for every value of when 0 ≤  ≤ 1.
A set containing only one point is a convex set. Null set is also considered as a convex
set.
Figure 5

A sold circle is a convex set.
Algeberically x2 + y2 ≤ r defines a convex set.
The equation depicts the circle plus the point in it.
The circle is not hallow. All the three figures given above are examples of convex sets.
Solid figure given below are not convex sets
Figure 6
In these figures there is a feature of reentrance (and also a hole). This is a cause for
non-convexity. To qualify as a convex set, the set of points in the figure must contain no
holes, and its boundary points must be not be reentered anywhere.
The geometric definition of convexity also applies to points in 3-space as well as n-

space. A solid cube is a convex set. A hollow cylender is non convex.
A function which gives rise to a hill over its entire domain is a concave function. A
function which gives rise to a valley over its entire domain is convex function.
If the hill (or valley) does not contain any flat surface then the function is suidth be
strictly concave (convex) function. In case a function which give rise to hill (or valley)
and contains flat surface also, is a concave function (convex function).
4.5 Assumption
The domain of the function is a convex set. This assumption is necessary because we
use the combination of x1 and x2 in the domain D to prove whether the f is a concave or
convex function.

Figure 7
In fig. 7, let x ≥ 0 be the domain of the function. This domain is a convex set. If we take
two values of x in the domain x1 and x2. The associated values of the function are f(x1)
and f(x2) connect these two points by a straight line AB. The graph of the function is also
given in the fig1. The graph of f is shown by are AB.
The straight line lies below the arc AB. The value of the function at x̄ between x1 and x2
is f(x̄) = C. This is higher than the point (D) on line AB immediately above the value x̄. It
is clear that f(x̄) > d. This property can be expressed as strict concavity of the function.
The value of x between x1 and x2 can be written as
x̄ = x1 + (1 – )x2 where ∈ [0, 1] or 0 ≤ ≤ 1
The point d on the line AB can be expressed as
d = f(x1) + (1 – ) f(x2) 0 ≤ ≤
The point c on the arc AB is
c = f(x1 + (1 – )x2)
The function is strictly concave if
c = f(x1 + (1 – )x2) > f(x1) + (1 – ) x2
In simple words if we take only two points on the domain where the domain is a convex
set then convex combination of these points is also in the domain of the function.
4.6 Theorem
The f is strictly concave if
c = f(x1 + (1 – )x2) > f(x1) + (1 – ) f(x2) ∈ [0, 1] and x ∈ Dan interval which is a
convex set.
The function f is concave if f(x1 + 1 – )x2) ≥ f(x1) + (1 – ) f(x2) 0 ≤≤1 f(x)

Figure 8
In fig. 8, line AB lies entirely above the graph of the function except at point A & point B.
the f(x) is a convex function. A convex function bends below the line joining points f(x1)
and f(x2) (AB).
A function f(x) is strictly convex if
f(x1 + (1 – )x2) < f(x1) + (1 – )x2 where ∈ [0, 1]
It is a concave function if
f(x1 + (1 – )x2) ≤ f(x1) + (1 – )x2
A linear function is a convex and concave function because it satisfies the conditions of
both convex and concave function.
Figure 9a Figure 9b
Figure9c

Another way of defining concave and convex functions is as discussed below.
Def : A function f is concave on the convex set D
where D = [a, b] if {(y, x)| h ≤ f(x), x ∈ D} is a convex set.
For a convex function the inequality is reversed. In fig 9(a) the function is convex and in
fig 9(b) the function is concave. In fig 9(a) look at all the points above the graph of the
function and below the straight line k parallel to x-axis. The set of points satisfy the
above definition and are actually a convex set. Similarly the shaded area in fig b is a
convex set. The function depicted in fig a is a convex function on the convex domain [a,
b].
Now observe fig 9c. There is a tangent at point x0. The tangent line h on any point on a
concave function will lie above the function (except at point f(x0)). For a convex function
the tangent line at any point x0 will lie below the graph of the function.
It means if f is concave and differentiable the n
f(x) ≤ f(x0) + f'(x0) (x – x0) (1)
for any x0 and any other point X.
The right hand side is actually the equation of the tangent at x0 on the function. If we
move slightly away from point x0 on either side of x0 the tangent line at point x0 lies
above the graph of the function.
For convex function the inequality is reversed.
f(x) ≥ f(x0) + f'(x0) (x – x0) (2)
In the case of convex function (if the function is differentiable) the tangent line at (x0,
f(x0)) will lie below the graph of the function except at point (x0, f(x0))
5. Second derivative test for concavity and convexity.
If the function is differentiable twice then we can use the second derivative to test the
concavity and convexity.
Figure 10a Figure 10b

In fig. 10(a) tangent at any point on the graph of the function lies below the graph of the
function. In fig. 10(b) the tangent line on any point on the graph of the function lies
entirely above the graph of the function. We can use the second derivative to determine
the concavity and convexity of the function. We know that the sign of the first derivative
on an interval (a, b) tells us whether the function is increasing or decreasing. We also
know that the first derivative at a point on (a, b) give us the slope of the tangent at that
point.
f'(x) > 0 on an interval (a, b) means that the function is increasing on (a, b)
f'(x) < 0 on an interval (a, b) means that the function is decreasing on (a, b)
f''(x) (the second derivative) is the derivative of f'(x)
therefore f''(x) > 0 iff f'(x) is increasing an (a, b)
f''(x) < 0 iff f'(x) is decreasing on (a, b)
If f''(x) > 0 it means the slope of the tangent is increasing as we move from left to right
on the graph. In the fig (a) the slope of the tangent is increasing when we move from x1
to x2, where x1 < x2. This happens when the function is convex.
If f''(x) < 0 on (a, b) then tangent becomes flatter when we move to x0 from the left. It
means the slope of the tangent is decreasing as we move from left to right on the graph.
In other words when we move from x1 to x2 where x1 < x2 the slope of the tangent is
decreasing.
We conclude f is strictly concave on interval I if and only if f''(x) < 0 for all x in the
interior of I.
Function f is strictly convex on interval I if and only if f''(x) > 0 for all x on interior of I.
Example
Show that x + x is convex on R2
The function is convex if
Y X
(y + y ) – (x + x ) ≥ (2x1 2x2)
Y X
= 2x1y1 – 2x + 2x2y2 – 2x
y + y – x – x2 + 2x + 2x – 2x1y1 – 2x2y2 ≥ 0
y + y + x + x – 2x1y1 – 2x2y2 ≥ 0
= (y1 – x1)2 + (y2 – x2)2 ≥ 0
This is true for all (x1 x2) and (y1, y2) in R2

We can also use the second derivative test to check the convexity
f1 = 2x1, f2 = 2x2, f11=2, f12 = 0, f22 = 2
f11 < 0 f22 > 0 f11 f22 – (f12)2 > 0 = 4  0 > 0 proved
6. Quasi-concave and Quasi-convex Functions
While discussing concave and convex functions we saw that if the function is concave
(convex) there is no need to check the second order condition to determine whether the
function achiever maxima (minima) or not. When we are dealing with problem of
constrained optimization, it is again possible to dispense with the second order condition.
In the case of constrained optimization quasi-concavity of the function obviates need for
second order condition for determining the maxima. In a similar manner quasi-convex
function removes the need for second order condition when we are trying to find out
minimum of the function.
In the beginning we shall discuss the concept of quasi-concavity and quasi-convexity in

the case of function of single variable. We are using fig. 11(a).
1) Let f be a function of x [y = f(x)]
2) x and y are non-negative [x, y ≥ 0]
This means that we are in the first quadrant.
3) The domain of the function is convex set.
4) Choose two distinct point xi and xj (or x1 and x2) such that xi < xj in the convex
domain of the function.
5) The function f(x) forms an arc between xi and sj such that f (xi) = A and f (xj) =B
Fig 11(a) Fig 11(b)
In fig. 11(a) point B is higher in height than A. In other words f(xj) > f(xi). The function
is strictly quasi-concave if all other points on are AB are higher in height than point A.

In fig. 11 (b).
1) The domain of the is Ist quadrant (xi, xj ≥ 0)
Domain is convex set.
2) Two distinct points are xi and xj such that xi < xj.
3) The function f(x) forms an arc CD between xi and xj such that f(xi) = c and f(xj) =
D. In fig(b) f(xj) < f(xi). The function is strictly quasi-concave if all other points on
the arc are lower in height than f(xi)
Now we shall give an algebric definition of quasi-concavity and quasi-convexity.
Let f be a function of x. Then for any two distinct points xi and xj in the convex domain of
the function such that xi < xj and 0 < θ < 1, the function is strictly quasi-concave
function if the following inequality is satisfied.
f(xj) ≥ f(xi)  f[θxi + (1–θ)xj] > f(xi)
If we replace the strict inequality with weak inequality then the function is quasi-
concave.
The weak inequality implies that there is some horizontal straight line segment also on
the arc AB.
7. Quasi-convex function
Suppose f is a function of x. then for any two points xi and xj and for 0 < θ < 1 the
function is strictly quasi-convex if the following inequality is satisfied.
f(xi) > f(xj) f(xi) > f[θxi + (1 – θ)xj]
If we replace the strict inequality with weak inequality the function satisfy the condition
is quasi-concave.
f(xi) ≥ f(xj) f(xi)≥ f[θxi + (1 – θ)xj]
Differentiable functions
If f is a function of x and is differentiable then the function is strictly quasi-concave if for

the two points xi and xj on their convex domain such that xi < xj, the following inequality
is satisfied.
f(xj) > f(xi) f'(xi) (xj – xi) ≥
where f'(xi) = /x=xi
The function is quasi-concave if
f(xj) ≥ f(xi) f'(xi) (xj – xi) ≥ 0
The function is strictly quasi-convex if
f(xi) > f(xj) f'(xi) (xj – xi) < 0

Quasi-convex if
f(xi) ≥ f(xj) f'(xi) (xj – xi) ≤ 0
8. Properties of Quasi-concave and Quasi-convex functions
1) A linear function is both quasi-convex and quasi-concave.
2) All concave (convex) functions (strict or non strict) arc quasi-concave (quasi-
convex). But the opposite is not true.
3) If f(x) quasi convex (strict or non-strict) then –f(x) is quasi-concave.
9. Functions of multiple variables
Let a function z = f(x, y) of two variables
The function is defined on R2++ (x, y ≥ 0)
Let u = (x1, y1) and v = (x2, y2) and v > u
then f(x, y) is quasi-concave if
f(v) > f(x)  f[θx + (1 – θ)v] > f(x) strictly quasi-concave
f(v) ≥ f(x)  f[θx + (1 – θ)v] ≥ f(x) quasi-concave
f(v) < f(x)  f[θx + (1 – θ)v] < f(x) strictly quasi-convex
f(v) ≤ f(x)  f[θx + (1 – θ)v] ≤ f(x) quasi-convex
10. Differentiable function
Suppose a function z = f(x1, x2, ........ xn) is twice continuously differentiable. The quasi-
concavity and quasi-convexity of the function can be checked with the help of first and
second partial derivatives of the function arranged as a bordered determinant.
0 ……….
………….
|B| = …………
.. .. ..
. …………
The bordered determinant appear to be similar to bordered Hessiam |H| determinant

which we use when we deal with constrained maxima (minima). In the case of of
bordered Hessian |H| the first row (column) of |H| consists of 0 and fire derivatives of
constraint. In this case first row (column) of |B| consists of 0 and first derivatives of the
function f (x1, x2,...., fxn) It is because the quasi-concavity (quasi-convexity) depends
exclusively on the partial derivatives of function of itself. We use B along with its leading
principal minor

0
0
|B1| = |B2| = |Bn| = |B| =
0
The domain of the function is non-negative or thant (that is x1, x, x3 .......... xn ≥ 0)
z = f(x1, x2,.......xn) is quasi concave on the .
|B1| ≤ 0, |B2| ≥ 0, |Bn| = |B| ≤ 0 if n is odd

= |B| ≥ 0 if n is even
The function is strictly quasiconcave on the
If
|B1| < 0, |B2| > 0,..... |Bn| = |B| < if n is odd
= |B| > if n is even
Quasi-convex.
|B1| ≤ 0, |B2| ≤ 0, |B3| ≤ 0,...., |Bn| = |B| ≤ 0
Strictly quasiconvex if
|B1| < 0, |B2| < 0,...... |Bn| = |B| < 0.
11. Exercises
1) Are the following function quasiconcave ? Which of them are also concave
a) f(x, y) = x1/2 + y1/4 fx all x, y≥ 0
b) f(x, y) = x1/4 + y3/4 x, y ≥ 0
c) f(x, y) = x2 y3
d) f(x, y) = x y2
2) Which of these function defined on are quasiconvex which are also convex
a) +
b) 3 +4
c) 2x1 + 3x2 –
3) Are the following functions concave or convex?
a) z = – (x + x )
is it a concave function ?
b) z = xα yβ show that it is concave
if 0 α < , β < 1 and 1 – (α + β) > 0

c) –3x2 + 2xy – y2 + 3x – 4y + 1 Ans (Concave)
d) x4 + x2 + y2 + y4 – 3x – 8y (Convex)
e) f(x, y) = – xy Neither concave nor convex.
f) x – y – x2
g) –6x2 + (2a + 4)xy – y2 + 4ay
What is the value of a which will make the function concave
h) f(x, y) = –2x2 – y2 + 4x + 4y – 3 for all (x, y)
has a maximum at (x, y) = 1, 2
4) Is this function concave or convex?
a) (x1 + x2) /
defined on R2++
Show that is concave but not strictly concave
b) 3x1 + x defined on all R2
Show that it is neither strictly concave nor strictly convex.
12. References
Allen, R.G,D, Mathematical Analysis for Economists, London: Macmillan and Co. Ltd
Knut Sydsaeter and Peter J. Hammond, Mathematics for Economic Analysis, Prentice Hall
Carl P. Simon and Lawrence Blume, Mathematics for Economists, London: W .W. Norton & Co.

Constrained Optimization
DC-1
Semester-II
Paper-IV: Mathematical Methods in Economics-II
Lesson: Constrained Optimization
Lesson Developer: Rakhi Arora & Vaishali Kapoor
College/Department: Department of Economics, University

of Delhi

Table of Contents
2. Introduction
3. Geometrically analyzing constraint
4. Algebraically setting constraint into optimization equations
5. Lagrange multiplier method
a. Economic interpretation of lagrange multiplier
b. Lagrangrian multiplier as benefit-cost ratio
6. Sufficient conditions for constrained optimization
7. Envelope results
8. Exercises
9. References
Learning outcomes:
1. Explain the importance and relevance of constraint.

2. Differentiate between free and constrained optimum.
3. Solve problems of constrained optimization in economics.
4. Define parameters.
5. Interpret lagrangian multiplier.

Introduction
In the last chapter, we covered optimization if objective function of two or more choice variable.
But that optimization was unconstrained. It was unconstrained in the sense that for example, in
case of discriminating monopolist; there was no restriction or limit to what level of output to be
produced. But there could have been constraints, that given the level of technique, & machinery,
the total maximum output that could be produced is suppose; 1000 units. In such a case,
optimized maximum may differ if value of extrema is greater than provided this constraint.
In this chapter, we will cover optimization with equality constraints. The new optimum referred
here is constrained optimum, which is likely to differ from free optimum.
This chapter is divided into sections. In the first section, we will analyze geometric properties of
constraint.
1. Geometrically analyzing constraint
The primary purpose of imposing a constraint is to give due cognizance to certain limiting
factors present in the optimization problem under discussion. In the last chapter, we saw hills
and valleys in 2D and bowls and domes in 3D and found relative extrema in all such cases. But
there were no constraints.
Let us take example of a consumer who consumes only good & has utility function:
U(x) = - (x-2)2+4
Its free optimum (maximum) is at x=2. But if government imposes some restriction that no one
can consume more than 1 unit of x then constrained optimum is at x=1 .This is shown in fig 1
below.
Fig.1

Let us consider another example that utility of a consumer depends on two goods, x1 & x2, with
utility function as follows:-
U = x1x2+2x1
If one finds, partial derivative of U then one would find that marginal utilities are positive and
increasing function of x1 & x2. Hence, an unconstrained optimum would give result for
purchasing infinite amount of goods. But a consumer has constraint known as budget constraint.
If price of x1 is Rs. 4 and price of x2 is Rs.2; income of consumer is Rs.100 then the constraint
becomes:
4x1+2x2=100
This constraint now narrows down the choice of x1 & x2 & one can find optimum x1& x2.
Fig.2
If one considers a general function z = f( x,y) and assumes it appears like a dome in 3D then free
extremum is peak of the dome but constrained extremum is at the peak of u-shaped curve
situated on top of the constraint. In fig.2, MN is constraint line indicating that sum of x & y
cannot go beyond this line. Then constrained maxmium is at point B.

2. Algebrically setting constant into optimization equations.
All the points viz. C,D,E,F,G & H are feasible (Infact, the entire section of the bowl in the right
hand side of constraint is feasible section). In Fig.3, various level curves of the function Z=f(x,y)
are drawn. It is 2D projection of a 3D dome. M N is the constraint on sum of X & Y say of the
form g(x,y) =c.
Fig.3
A’ in fig.3 corresponds to A in fig.2 & is unconstrained maximum. B’ (corresponding to B in

fig.2) is constrained maximum. In Fig.3, at B’, slope of the level curve f(x,y) = z1, is equal to the
slope of constraint g(x,y) =c.
Free & constrained maximum
A constrained maximum can be expected to have a lower value than the free
maximum. It could also be that free optimum is also constrained maximum in
which a case, constraint is not binding. If constraint is binding, which
generally it is; free maximum is higher than constrained maximum. But
constrained maximum can never exceed free optimum.

Algebraically setting constraint
The condition for optimum of f(x,y) = z requires that steep of level curve f (x,y) = z1 is equal to
the slope of constraint g(x,y) = c, which can be expressed as follows:-
 g x ( x, y )  f x( x, y)

g y ( x, y ) f y( x, y )
Example1:
Suppose one wishes to maximize f(x,y)= xy subject to 2x+y = m.The constraint can also be
written as y=m-2x. Then objective function becomes f(x,y(x)) = x (m-2x). Now, the objective
function becomes function of one variable. So, for its optimization
f
0
x
f
 m  4x  0
x
 4x  m
m
 x
4
2 m m
and y  m  
m 4 2
Similar result can also be desired by using calculus techniques as follows:-
f x( x, y ) 2

f y ( x, y ) 1
y 2
 
x 1
 y  2x
Putting y = 2x into 2x+y = m yields
x= m/4 and y = m/2.

Lagrange Multiplier method
Suppose f (x0,y0) is optimum value of f(x,y) = z with the constrains g (x,y)= c. Then we know
that:
f x( x, y ) g x ( x, y )

f y( x, y ) g y ( x, y )
f x( x, y ) f y ( x, y )
 
g x ( x, y ) g y ( x, y )
At (x0,y0); the above ratios would have some common value. The common value of these ratios
is known as Lagrange multiplier and then the above equation becomes:
f x( x, y)   g x ( x, y)  0
f y( x, y)   g y ( x, y)  0
Now let us define lagrangean function ,Lby:
L( x, y)  f ( x, y)   ( g ( x, y)  c)
The partial derivatives of L(x, y) with respect to x and y are Lx ( x, y)  f x( x, y)   g x ( x, y) and
Ly ( x, y)  f y( x, y)   g y ( x, y) , respectively.
Equate these partial derivatives equal to zero and solve these equations along with constraint
g (x,y) = c. Solve these equations for optimum valves of x,y and λ .
Lagrangean Function: a better technique
The advantage of lagrangean function over slope equality method is that this method
can involve more than two variable and more than one constraint (which we will solve
is coming examples).

Example 1 contd.:-
The lagrangean is L( x, y)  xy   (2 x  y  m)
Lx ( x, y)  y  2  0 (1)
Ly ( x, y)  x    0 (2)
L ( x, y)  2 x  2  m  0 (3)
Solving 1 & 2 we get,
y = 2x
Putting y = 2x in third equation , we get x=m/4 and y = m/2. The result obtained here again is
same as done by previous techniques. Hence, any of these techniques is equally applicable.
Also y = x from equation 2, so x= m/4. Notice that x,y & λ: all are function of m. m, here is
referred as parameter because optimal value of f(x,y) is also function of m as optimum value
equals (m/4) (m/2) i.e.(m/8).
Economic Interpretation of lagrange multiplier.
Suppose consider the objective is to maximize f(x,y) subject to g (x,y) = c . Suppose that, x* &
y* are the values of x and y that solve for this problem. In general, x & y depend on (parameter
of this model) we assume x=x*(c) and y = y*(c) are differentiable functions of c. The associated
value f* of f(x,y) is then also function of c,
f * (c)  f ( x* (c), y* (c))
Here f*(c) is also called optimal value function. Also, λ is a function of parameter: c. taking
differential of above equation, we get:
df * (c)  df ( x* , y* )
 f x( x*  y* )dx*  f y( x* , y* )dy*
We also know that f x( x , y )   g x ( x , y ) and f y( x* , y* )   g y ( x* , y* ) so

* * * *
df * (c)   g x ( x* , y* )dx*   g y ( x* , y* )dy*

Taking the total differential of constraint
g ( x* (c), y* (c))  c yields
g x ( x* , y* (c)dx*  g y ( x* , y* )dy*  dc
So it implies df * (c)   dc
In particular if dc is small charge is c, then
f * (c  dc)  f * (c)   (c)dc
df * (c)
Also   (c )
dc
Thus, the Lagrange multiplier is the rate at which the optimal value of the objective function
changes with respect to changes in the parameter c.
In economic applications, c after denotes the available stock of resources which acts as constraint
on utility or profit function: f (x,y). λ becomes then the shadow price of the resource as it
indicates how utility or profit changes as dc more units of recourses are provided.
Lagrangian multiplier as benefit-cost ratio
Consider again that objective is to maximize f(x,y) subject to g (x,y) = c. Then, we know that
f* f y
 
gx g y
In other words, at maximum point ratio of fi to gi is same for every choice variable, i (x & y
here). The numerators (fi) are the marginal contributions of each choice variable to function f.
They show the marginal benefit that one more unit of x or y will have for the function to be
maximized (f(x,y).Denominators i.e. gi:s are marginal cost of each choice variable. That is, they
reflect the added burden on the constraint of using slightly more of x ( or y).

Example 1 contd:-
Objective was to maximize f (x,y )= xy subject to 2x+y = m. As solved earlier, x(m) = m/4,
y(m)= m/2, and λ(m) = m/4. So the value function is f* (m) = (m/4) (m/2) = m2/8.
df * (m) m
   (m)
dm 4
Suppose m is 100 so f *(100) = 100/8. If m increases to 101 then, new optimized value would be
f *(101) = 101/8. f*(101) – f*(100) = 25.125.
Also from above results, we know that:
df * (c)
  (c)   (100)  25
dc
which is a good approximation to the actual change is the optional value function
Example 2 :
Sufficient Conditions
The conditions that we studied till now were necessary conditions but not sufficient. To make
this clear, let us consider following example:
max f ( x, y)  2 x  3 y subject to
x  y 5
The Lagrangean is L( x, y)  2 x  3 y   ( x  y  5) . So the three first order conditions

becomes:
1
Lx ( x, y )  2   0
2 x
1
Ly ( x, y )  3   0
2 y
x  y 5

Solving first two equations yields y  4 x / 9 . Putting this value is third equation yields x = 9 and
y = 4. This is indicated by point P in Fig. 4. But as it is evident (9, 4) does not solve the problem
(of maximizing f(x, y)). Rather solution to this problem is Q = (0, 25) where constraint is
satisfied and 2x + 3y optimized value of 75 (instead of 30 at point P).
Figure 4
So, this lays the ground that these first o5rder conditions are though necessary but not
sufficient.
Consider the same problem of maximising z  f ( x, y) subject to g ( x, y)  c .
Let there be some stationary point ( x0 , y0 ) . By implicit function theorem, the equation
g ( x, y)  c defines g as differentiable function of x in some neighbourhood of ( x0 , y0 ) . Let this
be denoted by y  h( x) , then
y  h( x)   g x ( x, y) / g y ( x, y)
The problem of maximisation of f ( x, y ) is reduced to maximisation of z  f ( x, h( x)) i.e. with

respect to single variable, x.
Then

dz
 f x( x, y )  f y( x, y ) y
dx
dz g  ( x, y )
  f x( x, y )  f y( x, y ) x
dx g y ( x, y )
dz
The necessary condition becomes  0.
dx
The sufficient condition for maximum of z becomes that second order derivative of z with
respect to x becomes less than zero (for max.).
d2z g x   g xy
( g xx  g ) g y  ( g yx  g yy y ) g x
 f   f  y   ( f   f  y )  f 
g y ( g y )2
xx xy yx yy xy
dx 2
d 2z 1
 [ f xx   f xx ( g y )2  2( f xy   f xy ) g x g y ( f yy   g yy )( g x ) 2 ]
dx 2
( g y ) 2
The expression is bracket becomes determinant
0 g x g y
D( x, y )  g x f xx   g xx f xy   g xy
g y f xy   g xy f yy  g yy
d 2z 1
 D ( x, y )
dx 2
( g y )2
A sufficient condition for ( x0 , y0 ) to solve constraint problem is that ( x0 , y0 ) satisfies the first
order conditions and moreover, that the bordered Hessian D ( x0 , y0 ) given above is > 0 in the
maximization case and, is < 0 in the minimization case.
Envelope Results
Optimization problems in economies usually involve functions that depend on a number

of parameters, like prices, income levels, taxes etc. These are parameters as these act as constant
during optimization but they vary according to economic situation. For example, in case of utility

maximization, income is fixed and we find optimal values of xi ' s that maximize utility. But
then in next period if income changes, then optimal solution would also change. To see, how
optimal solution changes when parameters change, we would encounter here Envelope Theorem.
Consider the problem
max f (x, r) subject to g j ( x, r )  0 , j = 1, ..., m

x
where r = (r1, ..., rk) is vector of parameters and x = (x1, ..., xn) is vector of choice
variables.
The optimization would give solution as x1* (r), ...., xn* (r) and optimal value of
f (x, r)  f * (r) , where f * (r ) is optimal value function for this problem.
f * (r)  f (x* (r), r)  f ( x* ,(r),..., xn* (r), r
Suppose now we wish to study how our optimal value function changers when nth parameter rh
changes. One way is to assume this new rh, set lagrangean function, obtain value of f (x, r ) . To
avoid such tedious process is study how optimal value function changes as rh changes.
f * (r ) n f ( x* (r), r ) xi* (r) f (x* (r ), r )

 . 
rh i 1 xi rh rh
The above equation implies that f * ( x) changes on two accounts : first, change in rj changes
vector r and it changes f (x, r ) directly and second, rh changes all the functions x1* (r ) and
m
hence indirectly changes f (x* , r), r) . Let the L(x* , r )  f (x, r )  
j 1
 j g j (x, r ) . The first
order condition for the given problem is given as follows:
f (x* (r ), r ) m gi (x* (r ), r )

 j for all i  1,..., n
xi j 1 xi
f * (r ) n  m  j gi (x (r ), r  xi* (r ) f (x* (r), r )

*
So     
rh i 1  j 1 xi   rh rh

f * (r ) m  n gi (x* (r), r) xi* (r)  f (x* (r), r)

    j  
rh j 1  i 1 xi  rh  rh
Differentiating identity gi (x* (r), r) = 0 w.r.t. rh yields
gi (x* (r), r) xi* (r) g j (x (r), r)

n *
i 1 xi
.
rh

rh
0
which holds for all j = 1, ....m.
f * (r ) m
gi (x* (r), r) f (x* (r), r)
So    j 
rh j 1 xi rh
L(x* (r ), r )

rh
Example 2 : Utility maximization. Consider the problem U ( x, y)  100  e x  e y subject to

the constraint px  qy  m .
The lagrangean function L(x,y)= 100  e x  e y   ( px  qy  m)
The first order conditions are as follows:
Lx  e x   p  0 (1)
Ly  e y   q  0 (2)
L  px  qy  m  0 (3)
e x p
Solving first two equations yields  y 
e q
p
 e x  e y
q
Taking on both sides

 p
  x   y  ln  
q
 p
 x  y  ln  
q
Putting value of x in (3)
  p 
P  y*  ln     q*y  m
  q 
 p
ln    m
y*   
q
pq
 p
ln    m
 p
x*   
q
 ln  
pq q
x and y here becomes function of parameters price of x : p, price of y : q and income : m
Example 3 : Cost minimization
Cost function of the firm that uses capital K and labour, L to produce single output q. and
production function is Q  F ( K , L)  K 1/ 2 L1/ 4 . The prices of labour and capital are w and r
respectively.
So problem is minC  rk  wL subject to
K 1/ 2 L1/ 4  Q
L( K , L)  k  wL   ( K 1/ 2 L1/ 4  Q)
The necessary conditions are as follows:
1
Lk  r   K 1/ 2 L1/ 4  0 (1)
2

1
LL  w   K 1/ 2 L3 / 4  0 (2)
4
L   K 1/ 2 L1/ 4  Q  0 (3)
Solving (1) and (2), we get
  2rK 1/ 2 L1/ 4  4wK 1/ 2 L3/ 4
 r 
 L K
 2w 
Putting this into (3) we get
K 3/ 4  21/ 4 r 1/ 4 w1/ 4Q
K *  21/ 3 r 1/ 3w1/ 3Q
K *  21/ 3 r 1/ 3 w1/ 3Q4 / 3
 r  * 2 / 3 2 / 3 2 / 3 4 / 3
L*   K  2 r w Q
 2w 
Corresponding minimal cost is
C*  rK *  wL*  3.22 / 3 r 2 / 3w1/ 3Q4 / 3
Example 4:
Suppose a firm produces TV sets at two different locations and x1 units are produced at first
location and x2 at second location. The joint cost function is given by
C  0.1x12  0.2 x22  0.2 x1x2  180 x1  60 x2  25000
If the firm has to supply an order of 100then this being constraint x1* and x2* would be produced
so as to reduce cost.
L( x1 , x2 )  0.1x12  0.2 x22  0.2 x1 x2  180 x1  60 x2  25000   ( x1  x2  100)

Lx  0.2 x1  0.2 x2  180    0 (1)
Ly  0.4 x2  0.2 x1  60    0 (2)
L  ( x1  x2  1000)  0 (3)
Solving above three equations yields :
x1*  400
x2*  600
So firm should 400 units of TV sets at first location and 600 units at second location.
Let us check for the sufficiency condition also
0 1 1
D( x1 , x2 )  1 0.2 0.2  0.2  0
1 0.2 0.4
Hence (400, 600) stationary point is a minima and min. cost is 2,69,000.
Example 3 : (Contd..) Envelope results
Suppose now that we wish to know how C* changes when r changes.
C * 
 (3.22 / 3 r 2 / 3 w1/ 3Q 4 / 3 )
r r
2
  3  22 / 3 r 1/ 3 w1/ 3Q4 / 3
3
 21/ 3 r1/ 3w1/ 3Q4 / 3
K
Now, if instead envelope theorem is used, which states that

C * (r , w, Q) L( K , L, r , w, Q)
 =K
r r
C *
Similarly,  L.
w
Exercises
Q1. If the utility of a consumer depends on two goods: x and y and utility function is given by
U= (x+2)(y+1). If prices of x and y are Rs. 2 and Rs.5, respectively and income is Rs.51. find the
optimal levels of x and y purchased by the consumer and indirect utility function.
[Hint: Indirect utility function is utility as function of parameters]
Q2. The production function of a firm is given by X= and prices of capital and labor are
fixed at Rs. r and Rs. w, respectively.
i) Find the cost minimizing combination of capital and labor.

ii) Derive cost function of firm.
Q3. The incomes of an individual in current and next year are Rs.500 and Rs. 792
respectively.his utility function of two consumption expenditures x and y is U= . If the
market interest rate is 10% p.a. ,determine optimum consumption expenditures and amount
consumer should borrow or lend in current year.
[Hint: constraint is income in two periods but consumption could differ from income of that
period as consumer can borrow or lend in the market.]
Q4. A monopolist has the following demand functions for each of his products X and Y; x=72-
0.5px and y=120-py. the combined cost is C=x2+xy+y2+35 and the maximum product is 40 units.
Find
i) Profit maximizing level of output

ii) Price of each product
iii) The total profit

Q5. Find the optimal mix and its cost in the case when a producer chooses an output
corresponding to isoquant k2l=16 and respective prices of c apital and labor are Re.1 and Rs.2,
respectively. Also find the expansion path.
[Hint: expansion path shows change in optimal values when output changes parameter changes]
References
K. Sydsaeter and P. Hammond, Mathematics for Economic Analysis, Pearson Educational Asia, Delhi,
2002

Unconstrained Optimization
DC-1
Semester-II
Paper-IV: Mathematical Methods in Economics-II
Lesson: Unconstrained Optimization
Lesson Developer: Rakhi Arora & Vaishali Kapoor

University of Delhi

Table of Contents
2. Introduction
3. First order and second order conditions revisited
4. Optimization in case of two variables
a. Geometrical characterstics
b. Differential conditions for optimization
5. Quadratic forms
a. Detrimental test for sign definiteness
b. Three variable quadratic forms
c. Extending it to n-variables case
6. Applications of optimization techniques in economics
a. Profit maximization by a multiproduct firm
b. Price discrimination
c. Duopoly
d. Profit maximizing level of inputs
7. Exercises
8. References
Learning outcomes:

1. Explain concept of optimum value.

2. Calculate optimum given different functions.
3. Apply calculus of optimization in economic analysis.
Introduction
The 'minimization' of any function (like cost of producing an output)

and 'maximization' (for ex : of profit function, consumer's utility function and
country' economic growth) is jointly known as 'optimization' i.e. 'quest for the
best. So far, you have studied optimization problem wherein only one
independent variable affected function to be optimized. But in real world, there
may be and there are more than one or n-variables affecting our objective
function.
In case there are two or more choice variables that affect the objective function
then optimisation techniques need to studied from lens of Total Differential
instead of Derivative technique (used in case of one choice variable). So to
create a analogy first we will study optimisation of objective function of one
choice variable with help of Differential. Also, geometric characteristics would
be analysed for one variable objective function.
In the first section of this chapter, we will analyse geometric characteristics and
calculus of optimisation in case of one variable. In the second section, we deal
with the optimisation in two variable case. In third section, quadratic forms of
total differential is covered and sufficient conditions for n-variable case is
derived. In last section, application of optimization technique in economics is
discussed.
The problem of optimization with a single choice variable would be revisited

again but in terms of differentials.
First Order Condition

Consider f ( x)  z then at a minimum (or maximum)( point, the necessary

condition for extremum at say point A is f ( x)  0 or dz = 0 as x varies.
dz  f ( x).dx
dz
From derivative form we know that f ( x) must be zero. f ( x)   0 . This
dx
is equivalent to saying that dz  0 as x varies.
Second Order Condition
Consider the following Fig. 1, at each point A, B and C. dz  0 . Point A is a

maximum, B is a minimum but C is neither so it can be stated that dz  0 is
necessary condition but not sufficient.
Fig. 1
For A to be maximum, dz  0 and dz  0 as we move in both directions

i.e. dz should be decreasing as move away from A. It is equivalent to saying
that d (dz )  0 , i.e. d 2 z  0 .
At a minima like at B, dz is increasing as we move away from B i.e. d 2 z  0 .

This gives the sufficient condition for optimization. At C point dz is increasing
when we move to the right direction and decreasing when we move to the left
direction.
Let us calculate d 2 z .

d 2 z  d (dz )  d[ f ( x)dx]  f ( x)dx 2
So the sufficient condition becomes:
For maximum of z : d 2 z  0
For minimum of z : d 2 z  0
I. Optimization in case of two variables
Now, the stage is set to extend the analysis of optimization to two

variables:
1. Geometrical Characteristics
Techniques of optimization and geometrical characteristics could be

analogously constructed from one variable to two variables. For a function of
one variable, an extreme value is represented by peak of a hill or bottom of a
valley in a two-dimensional graph as in Fig. 1. When z becomes function of x
and y (z = f(x, y)), now plotting this graph in 3D space these hills and valleys
will appear as domes and bowls like in Fig. 2 and Fig. 3.
Fig. 2 Fig. 3
In 2-D space, when a function of one variable is graphed at points,

where it attains minimum or maximum, tangent is parallel to x-axis. i.e. axis of
choice variable. At point A in Fig. 1 f ( x) attains maximum and tangent is

parallel to x-axis. Likewise, in Figure 2; at point A Tangent Tx is parallel to x-

axis in xz plane and also Ty is parallel to y-axis in yz plane. Alternatively, like is
Figure 4, we could have a plane tangent to z at point A." The 'hill' in 2D is
equivalent to dome in 3D.
Fig. 4
Similarly, at minimum value at point B in Fig. 1, we have tangent

parallel to x axis. At point B'; in Figure 3 we have Tx : tangent parallel to x-axis
in xz plane and Ty is parallel to y-axis in yz plane. The valley in 2D (for
minimum) is transformed as a bowl in 3D.
In Figure 1, at point C, tangent is parallel to x-axis but x is neither a

maximum nor minimum but rather a point where curvature of the curve
changes i.e. inflation point. Similarly, in Figure 5 and 6 points C and C"
respectively, are not optimum points even when tangents Tx and Ty are parallel
to x-axis in xz plane and y-axis in yz plane respectively

Fig. 5. Fig. 6
At point C' is Figure 5 it is a minimum when viewed in background of yz plane

and maximum when viewed in background of xz plane. A point with such dual
personality is known as 'saddle' point.
At C" in fig. 6, then surface get twisted so it is a inflexion point like

point C in Fig. 1.
2. Differential Conditions for Optimization
From here on we would consider the following functional form i.e.

z  f ( x, y) .The first order condition for an extremum is dz  0 . With reference
to z  f ( x, y) ; dz is total differential given by
dz  f x dx  f y dy
and dz  0 implies that an extremum point must be a stationary point and

simultaneously z must be constant for changes of two variables x and y. This
paramounts to saying that not both dx and dy are zero.
Then for dz to be zero it is necessary that partial derivatives f x and f y must be
zero. So, the first order condition becomes:
fx  f y  0

This condition was satisfies by point A' in Fig. 2. B' in Fig. 3, C' in Fig.5 and
C" in Fig. 6. But at points C' and C" the function did not attain an extreme
value so, this condition though necessary but is not sufficient.
As in one variable case second order total differential will determine the
whether any point is extremum or not. For the function z  f ( x, y); d 2 z is
calculated as follows:
d 2 z  d (dz )
(dz ) (dz )
 dx  .dy
x y
 
 ( f x dx  f y dy )dx  ( f x dx  f y dy )dy
x y
 f xx dx 2  f yx dydx  f xy dxdy  f yy dy 2
By young's theorem we know that cross partial derivatives are identical i.e.
f yx  f xy . So d 2 z  f xx dx 2  2 f xy dxdy  f yy dy 2 .
For maximum of z, d 2 z  0 becomes sufficient condition for maximum

of z for dx and dy both not being zero and likewise for minimum of z, d 2 z  0 .
For any values of dx and dy, not both zero.
2

 0 if f xx  0; f yy  0 and f xx f yy  f xy2
d z
 0
 if f xx  0; f yy  0 and f xx f yy  f xy2
Example 1. Consider the function z  e2 x  2 x  2 y 2  3 . To find extreme

values of z following are relevant partial derivatives:
f x  ze2 x  2 f xx  4e2 x
fy  4y f yy  4 f xy  0
First order necessary conditions become:

fx  0
 2e2 x  2  0
 e2 x  1
 2x  0
 x0
and fy  0
 4y  0
 y0
So the stationary point for z is (0, 0). To know whether z attains minimum or
maximum at (0, 0), lets check second order conditions.
f xx (0,0)  4
f yy (0,0)  4
f xy (0,0)  0
f xx f yy  16
f xx  0; f yy  0 and f xx f yy  f xy2
hence z attains minimum at point (0, 0).
z at (0, 0) is e0  0  0  3  4 so the maximum point is denoted by (0, 0, 4)
III. Quadratic Forms
The expression for d 2 z is an example of quadratic form. Second order

condition for extremum can also be studied from perspective of quadratic
equations. Consider the following:
Let a  dx, v  dy, a  f xx , b  f yy and h  f xy

So that d 2 z  f xx dx 2  2 f xy dxdy  f yy dy 2
becomes d 2 z  au 2  2huv  bv2
Here d 2 z is function of u and v in quadratic form. u and v which are dx and dy

are assumed variables here. This sort of analysis is totally different when we
earlier dealt with second order condition where second order partial derivatives
were variables.
But here we are interested in knowing the sign that d 2 z may assume at an
extremum. d 2 z needs to be negative for a maximum and positive for
minimum. The second order total differential will assume value at a point
depending upon specific value of partial derivatives for any value of dx and dy.
Hence, dx and dy can vary but at a extremum value second order partial
derivatives should have specific sign.
a) Detrimental Test for sign Definiteness
For two variable case,
d 2 z  au 2  2huv  bv2 ;
the sign of first and third terms are independent of the values of variables u and
v since they are squared in the above equation. Thus, for positive or negative
definiteness of these terms alone, depend on signs of a and b. But sign of
middle term could turn the sign of d 2 z .
If the entire polynomial could be made as function of u and v wherein u and v

appear in squares. Let us complete the squares
h2 2 h2
d 2 z  au 2  2huv  v  bv 2  v 2
a a
 2h h2   h2 
 a  u 2  uv  2 v 2    b   v 2
 a a   a 

h  ab  h 2 2
2

 au  v  v
 a  a
Now, d 2 z is positive definite iff a > 0 this would give first term as positive and
second term would also be positive iff ab  h2  0 .
Also, d 2 z is negative definite iff a < 0 and along with it ab  h2  0 .
Hence ab  h2  0 is a prerequisite for either extremum value. It is equivalent

to saying ab  h2 . Hence ab should be positive and it would be so when a and
b assume identical signs.
For positive definite of d 2 z (minimum of z); a( f xx ) should be positive,
b( f yy ) should be positive and ab  h2 ( f xx f yy  f xy2 ) should be greater than
zero. For negative definite of d 2 z (maximum of z); a( f xx ) and b( f yy )
should be negative and ab  h2 ( f xx f yy  f xy2 ) should be greater than zero. This
is same as derived earlier.
Another way of expressing d 2 z is as follows:
 a h  u 
d 2 z  [u v]   
 h b  v 
a h
Let us denote matrix   by D
h b 
then
positive definite iff | a |  0 

d 2 z is   and | D |  0
negagive definite iff | a |  0
where | a | is subdeterminant of | D | . It is also known as first principal minor

of | D | .
b) Three-Variable Quadratic Forms

In case z is function of three variable x, y and w,
d 2 z  f xx (dx2 )  f xy dxdy  f xwdxdw  f yx dydx  f yy d y2
 f yw dydw  f wx dwdx  f wy dwdy  f ww dw2
 f xx f xy f xw   dx 
 [dx dy dw]  f yx f xy f yw   dy 
 f wx f wy f ww   dw
f xx f xy
| D1 |  f xx ; | D2 |  and
f yx f yy
 f xx f xy f xw 
| D3 |   f yx f yy f yw 
 f wx f wy f ww 
where | Di | denotes ith principal minor of | D | .
Transforming d 2 z into a form where we have completed the squares,

d 2 z can be written as follows:
2
 f f 
d z  f xx  dx  xy dy  xw dw 
2
 f xx f xx 
f xx f yy  f xy2  f xx f yw  f xy f xw
2

  dy  dw 
f xx  f xx f yy  f xy
2

f xx f yy f ww  f xx f yw
2
 f yy f xw2  f ww f xy2  2 f xy f xw f yw
 (dw)2
f xx f yy  f 2
xy
this is equivalent to :
2
 f f 
d z | D1 |  dx  xy dy  xw dw 
2
 f xx f xx 

f xx f yw  f xy f xw
2
| D |  | D3 |
 2  dy  dw   (dw)2
| D1 |  f xx f yy  f xy
2
 | D1 |
Hence, d 2 z would be positive definite if
| D1 |  0, | D2 |  0 and | D3 |  0
So, for minimum of z all principal minors must be positive. For negative
definitness of d 2 z ;
| D1 |  0
| D2 |  0 [given that | D1 |  0 already]
| D3 |  0 [given that | D2 |  0 already]
Thus for maximum of z principal minors must alternate in sign in specified

manner.
c) Extending it to n-variable case
Let z be function of ( x1 , x2 ,..., xn ) then
 dx1 
 f11 f12 ... f1n   dx2 
d 2 z  [dx1 , dx2 ,..., dxn ]    dx 
 3
 f n1 ... ... f nn   
 dxn 
For minimum of z (positive definiteness of d 2 z all principal minors:

| D1 |,| D2 |,....,| Dn | should be positive. For maximum of z (negative
definiteness of d 2 z ) principal minors should alternate in sign. If k is even then

| Dk | should be positive and if k is odd | Dk | should be negative. It is
equivalent to saying that (1)k | Dk |  0 .

IV. Applications of Optimization Techniques in Economics
1. Profit Maximisation by a Multiproduct firm. When a firm produces

more than one product; firm's profit depend on production / revenue of
all products. Lets assume that a firm produces two types of goods X1 and
X2. Let prices be p1 and p2 of goods X1 and X2 respectively. The cost
function of the firm is given by
C  F ( x1 , x2 ) .
Then the profit function  can be written as:
  p1x1  p2 x2  C
The above profit function has to be maximised with respect to

x1 and x2 , both. This in turn would depend on form of market. Market
could have perfect competition or monopoly.
a) Perfectly competitive markets for X1 and X 2 .
The first order conditions for profit maximization are : -
 c
 p1   0 or p1  MC1
x1 x1
 c
 p2   0 or p2  MC2
x2 x2
The second order conditions are : -
 2  2c  2  2
 0   0,  0  and
x12 x12 x22 x22
 2  2   2 z 
.  
dx12 dx22  x1x2 

b) When markets for X1 and X 2 are not perfectly competitive. Let

demand function for X 1 be x1  f1 ( p1 , p2 ) and X 2 for
x2  f 2 ( p1 , p2 ) .
Profit function can be written as
  R( x1, x2 )  C ( x1, x2 )
where R( x1 , x2 ) is revenue function for the firm.
The first order conditions for maximum profit are
 R C
   0  MR1  MC1
x1 x1 x1
 R C
   0  MR2  MC2
x2 x2 x2
Second order conditions are :
 2  2 R  2C MR1 MC1
 2  2 0 
x12
x1 x1 x1 x1
 2  2 R  2C MR2 MC2
   0  
x22 x22 x22 x2 x2
2
 2  2   2 
and .  
x12 x22  x1x2 
Example 2 : Let a firm under perfect competition produces X1 and X2 with

prices equal to Rs. 10 and Rs.15 respectively. If cost function of the firm is
C  2 x12  x1x2  2 x22 where x1 and x2 denote the levels of output X1 and X2
respectively.
The profit function of the firm becomes:
  10 x1  15x2  2 x12  x1x2  2 x22


For maximum profit,  10  4 x1  x2  0
x1

and  15  x1  4 x2  0
x2
Solving these equations, we get
5 10
x1  and x2 
3 3
Amount of maximum profit is
5 10 25 5 10 100
 max  10   15   2    2
3 3 9 3 3 9
= R.s 33.3
Checking for second order conditions:
 2  2
 4  0,  4  0
x12 x22
 2
 1
x1 x2
2
 2  2 x   2 
 .  
x12 x22  x1 x2 
 5 10 
Hence profit is maximised at  ,  .
3 3 
Example 3. A monopolist produces two commodities x1 and x2, jointly. The

relevant cost function is C  x12  2 x1x2  3x22 . The demand function are
P1  36  3x1 and P2  50  5x2 .
Profit function for the monopolist,   x1 p1  x2 p2  C
   x1 (36  3x1 )  x2 (50  5x2 )  x12  2 x1x2  3x22

 36 x1  3x12  50 x2  5x22  x12  2 x1x2  3x22
 36 x1  4 x12  50 x2  8x22  2 x1x2
For maximum of ;

 36  8 x1  2 x2  0
x1

 50  16 x2  2 x1  0
x2
119 82
Solving above two equations, we get x1  and x2 
31 31
119 759
P1  36  3  
31 31
82 1140
and P2  50  5  
31 31
2
119 (119)2 82  82 
 max  36  4  50   8   
31 (31) 2
31  31 
119 82 129952
2   
31 31 (31)2
= Rs. 135.23
Checking for second order conditions:
 2  2
 8  0;  16  0
x12 x22
 2
 2
xy
2
 2  2   2 
 .  
x12 x22  x1x2 

 119 82 
Hence profit is maximum at  , .
 31 31 
Example 4: A manufacturer can produce a commodity at two location. The

selling price per unit is given by P = 200  0.8x, where x = x1 + x2. The cost
functions at the two locations are C1  0.3x12  60 x1  5000 and
C2  0.5x22  30 x2  8000 respectively.
Total revenue, TR  [200  0.8( x1  x2 )]( x1  x2 )
 200( x1  x2 )  0.8( x1  x2 )2
Total cost, TC = C1 + C2
 0.3x12  60 x1  0.5x22  30 x2  13000
  200( x1  x2 )  0.8( x1  x2 )2  0.3x12  60 x1  0.5x22  30 x2  13000

 200  1.6( x1  x2 )  0.6 x1  60  0
x1
 2.2 x1  1.6 x2  140

 200  1.6( x1  x2 )  1.0 x2  30  0
x2
 1.6 x1  2.6 x2  170
x1  29.1 and x2  47.5
 2  2
 2.2,  2.6
x12 x22
2
 2  2   2 
.  
x12 x22  x1 x2 
Thus the profit maximization conditions are satisfied.

2. Price Discrimination
Let a monopolist who could sell its produce in two markets and charge
different prices in two markets. Let inverse demand functions
P 1  f ( x1 ) and P2  f ( x2 ) . Let total cost functions be C ( x) where x  x1  x2 .
Profit function be   p1x1  p2 x2  C ( x)  R1  R2  C , where Ri is total

revenue from ith market. The first order conditions for maximum profit are
 R1 C x
  .  R1  C   0
x1 x1 x1 x1
 R2 C x
  .  R2  C   0
x2 x2 x2 x2
Solving the above necessary conditions for maximum profit are :
R1  R2  C
MR1  MR2  MC
Second order conditions become:
 2  2 R1  2C
 0   2
x12 x12 x1
 2  2 R2  2C
 0   2
and x22 x22 x2
2
 2  2   2 
and .  
x12 x22  x1 x2 
Example 5: A discriminating monopolist is able to separate its customers into

two markets with respective demand functions as x1  16  0.2 p1 and
x2  9  0.05 p2 . The total cost function is C  20  20 x where x  x1  x2 .
Profit of the monopolist

  (80  5x1 ) x1  (180  2 x2 ) x2  20  20 x1  20 x2
 60 x1  5x12  16 x2  20 x22  20

 60  10 x1  0
x1
 x1  6

 160  40x2
x2
 x2  4
Second order conditions are :
 2  2
 10  0,  40  0
x12 x22
 2
and 0
x1 x2
2
 2  2   2 
.  
x12 x22  x1 x2 
Hence second order condition is satisfied for maximum of profit.
Price in market I, P1 = 80  5  6 = Rs. 50
Price in market II, P2 = 180  20  4 = Rs. 800
Maximum profit,   50  6  100  4  20  20(6  4)  Rs.480
x1 P1
Elasticity of demand in market I , 1   .
P1 x1
50
 0.2   1.67
6

x2 P2
Elasticity of demand in market II , 2   .
P2 x2
100
 0.05   1.25
4
Monopolist can charge higher price in case elasticity of demand is lower and
vice-a-versa. Price charged in a market and elasticity of demand is that market
are negatively related (You will be asked to derive this in exercises).
3. Duopoly
Under Duopoly market conditions, there are two sellers of a

homogeneous product with inverse market demand p  f ( x) where p is price
and x denotes quantity demanded and this x is sum of x1 and x2 ; where x1 is
output produced by first firm and x2 by second firm.
For each duopolistic profit function would be maximised with respect to its
own output. Profit functions of each duopolist are as follows:
Duopolist I : 1  R1  C1
 px1  C1 ( x1 )
 f ( x) x1  C1 ( x1 )
Deupolist II :  2  R2  C2
 px2  C2 ( x2 )
 f ( x) x2  C2 ( x2 )
Profit maximising conditions becomes:
 1
0
x1
f ( x) x
 f ( x)  x1 .  F1( x1 )  0
x x1

 f ( x)  x1 f1( x)  F1( x1 ) (1)
 2
and 0
x2
 f ( x)  x2 f 2( x)  F2( x2 ) (2)
Equation 1 gives level of output duopolist I would produce for a given level of
output of second. This is known as reaction function of deuoplist I. Likewise
Equation 2 represents reaction function of duopolist II. Solving 1 and 2
simultaneously, values for x1 and x2 can be calculated.
Example 6
The market demand of a product is given by p = 100  4x. The cost function for
two duopolist are
C1  x12  17 x1  40 and C2  0.5x22  15x2  90
profit of first duopolist is given by
1  (100  4 x) x1  x12  17 x1  40
 83x1  5x12  4 x1x2  40
For maximum profit,
1
 83  10 x1  4 x2  0
x1
83  4 x2
So reaction function of firms 1 is x1 
10
Similarly, profit function of second duopolist is given by
 2  (100  4 x) x2  0.5x22  15x2  90
 85x2  4 x1x2  4.5x22  90

 2
For maximum profit,  85  4 x1  9 x2  0
x2
85  4 x1
Reaction function of firm 2 is x2  . Solving reaction functions of two
9
duopolist would give equilibrium output i.e. x1  5.5 and x2  7
4. Profit Maximising levels of inputs
A firm suppose if use not only labour but also capital in the production
then production function would be x  f ( K , L) where x is production
dependent on capital (K) and labour (L). If monopolist purchases labour and
capital at constant prices of Rs. w and Rs. r per unit respectively. If demand of
monopolist's output is given by P  F ( x) ; then profit function becomes:
  p.x  (wL  rk )
where wL  rk is cost function of the monopolist.
The necessary condition for profit maximisation would then become :
  x P x
0 P x . w0
L L L x L
 P  x
 wP x 
 x  L
 x P  x
 w  P 1  
 P x  L
 1
 w  P 1   MPL
 
where  is elasticity of demand and MPL is marginal product of labour
 w  MR.MPL

 1
where P 1    MR
 

Similarly, 0
K
 r  MR.MPK
Example
LK
A firm's production function is X  12  .
LK
Let prices of labour (L), capital (K) and output (X) be Re.1, Rs.4 and Rs. 9
respectively.
Profit function of the firm is then given by
X  PX  ( L  4K )
 LK
 9 12   L  4K
 LK 
For Maximum profit

0
L
 LK  ( L  K ) K 
 9    1  0
 L2 K 2
 L2 K 2  9K 2
 L2  9
 L  3 (neglecting negative value of L)

0
K

 LK  ( L  K ) L 
 9    4  0
 L2 K 2
 9L2  4L2 K 2
9
 K2 
4
3
 K
2
3
Output X, when L = 3 and K 
2
3  3/ 2
X  12 
9/ 2
9 2
 12  
2 9
= 11 Units
Exercises
1. A monopolist produces two commodities that are substitutes and having

demand functions: X1=8-P1+P2 and X2=9+P1-5P2, Where 1,000 X1 units of first
commodity are demanded if its price is RsP1 per unit and 1,000X2 Units Of
The Second commodity are demanded if its price is Rs-P2 per units. It costs Rs-
4 to produce each unit of the first commodity and Rs-2 to produces each unit of
the second. Find the output levels and prices of the two commodities in order to
have maximum profit
2. A monopolist sells two products x and y for which the demands are: x=50-
0.5px and y-76-py
The combined cost function is C=3X2+2XY+2Y2+55.Find:
(i) The profit maximizing levels of output and price for each product.

(ii) The maximum profit.
3. A scooter manufacturer produces the same model of a scooter at two

different production plants. The cost of production of x1 scooters at plant I is
given by
C1= +1,000X1+2,500, and the cost of production of X2 scooters at plant II is

given by, C2=1.5 +2000X2+1800, Where x1 and x2 are the annual outputs of
plant I and II respectively
(i) If each scooter is sold at a uniform price of Rs.20,000,find the levels

of production of each plant so that profits are maximized.
(ii) If the annual demand of scooters follows the demand law x=30,000-
p, where p is the price of a scooter, find the levels of production of
each plant for maximum profits and the price of a scooter.
4. (a) A discriminating monopolist can separate his consumers into two distinct
markets with the following demand functions:
Market I: Q1=16-0.2P1
MARKET II: Q2=180-2P2
Assume that the monopolist’s total cost function takes the form TC+20Q-20=0,
Where Q (=Q1+Q2) is the total output. Obtain the total profit function and
determine the prices he would charge in the two markets to maximize profits.
What is the total profit? Do you agree that the price charged in the market with
a higher elasticity of demand would be higher? Show by calculations.
(b) Calculate the ration of prices charged by a discriminating monopolist in the

two markets with price elasticities of demand equal to 3.0 and 1.5.
5. Let there be two sellers of a homogeneous product with market demand

given by
P=B-ax,(x=x1+x2)a and B are positive constants. The cost functions of each of

the firm is given by C=axi, i=1,2 and a is positive constant. Assuming that the
conjectural variations are zero

(i) Find reaction curve of each seller.

(ii) Find equilibrium output of each seller and the industry.
(iii) Find equilibrium output if the industry becomes a monopoly.
6. the production function of a firm is given by q=12 and the prices (in
Rs) of q,l and K are 9,2 and 4 respectively
(i) Find the profit maximizing values of q, L and K.

(ii) Find the amount of maximum profit
(iii) Verify second order condition.
References
K. Sydsaeter and P. Hammond, Mathematics for Economic Analysis, Pearson

Educational Asia, Delhi, 2002

Statistics and Their Distributions
Statistical Methods in Economics-II

Lesson: Statistics and Their Distributions
Dyal Singh College(M), University of Delhi

TABLE OF CONTENTS
Section No. and Heading Page No.
1 Sample Statistic 2
1.1 Sampling With and Without Replacement 3
1.2 Sample Statistic and Sampling Distribution 4
2 Sampling Techniques 5
2.1 Simple Random Sampling 5
2.2 Stratified Random Sampling 6
2.3 Two- Stage or Multi-Stage Random Sampling 6
2.4 Systematic Random Sampling 7
2.5 Purposive Sampling 7
2.6 Cluster Sampling 7
3 Sampling and Non-sampling Errors 8
4 Deriving a Sampling Distribution 9
5 Analytical Methods for Deriving a Sampling Distribution 10
5.1 Using Probability Rules 10
5.2 Simulation Experiments 11
6 Distribution of the Sample Mean 12
7 Distribution of Sample Means when Population is Normally Distributed 14
8 Central Limit Theorem 16
9 Distribution of Sample Means when the Population is Non-Normal 17
10 Distribution of the Sum and Difference of Sample Means 18
References:
1. Jay L. Devore, Probability and Statistics for Engineering and the Sciences,
8th edition, Cengage Learning
2. Irwin Miller and Marylees Miller, Mathematical Statistics, Seventh Edition,
Pearson.

3. A.L.Nagar and R.K.Das, Basic Statistics, Second Edition, Oxford University

Press
Content developer: Chandra Goswami, Dyal Singh College (M)
STATISTICS AND THEIR DISTRIBUTIONS
In this chapter you will learn what is a sample statistic and its sampling
distribution. You will learn how to derive the probability distribution of a sample
statistic and the three alternative methods that can be used for this purpose. The
first method is based on selecting samples from the population. You will learn
about different methods for selecting a representative sample and the difference
between sampling and non-sampling errors. You will study in depth about the
probability distribution of sample means. You will learn about the significance of
the Central Limit Theorem in this context. The chapter ends with the study of
distribution of combinations of two sample means. The chapter is followed by
practice questions so that you can test your understanding of the chapter
contents.
Chapter Outline
1. Sample Statistic
2. Sampling Techniques
3. Sampling and Non-sampling Errors
4. Deriving a Sampling Distribution
5. Analytical Methods for Deriving a Sampling Distribution
6. Distribution of the Sample Mean
7. Distribution of Sample Means when Population is Normally Distributed
8. Central Limit Theorem
9. Distribution of Sample Means when the Population is Non-Normal
10. Distribution of the Sum and Difference of Sample Means
1 SAMPLE STATISTIC
Measures which describe some characteristics of the population are known as
parameters. Examples of population parameters are population mean μ,
~ , etc.
population standard deviation σ, population ratio p, population median 

These are constants for a population and remain unknown in the absence of
complete population census data.
In many situations, a population census may be impractical and very costly in

terms of time and money. Then we select a representative sample from the
population and use sample data to make inferences about the unknown
population parameters.
1.1 Sampling With and Without Replacement

A sample is said to be with replacement if the process of sampling is such that
after each unit is drawn from the population its value is observed and then the
unit is returned to the population. This means that the population remains
unchanged before every subsequent draw and each population unit has equal
probability of being selected on every draw.
Suppose the population to be sampled is of size N. If a sample of size n is to be

selected from the population, then each of the sample elements can be selected
in N different ways. So the first sample unit can be drawn in N different ways, the
second sample unit can also be selected in N different ways and so on for the n
elements of the sample. In sampling with replacement, since each element of the
population has the same probability of being selected on each draw, the total
number of possible samples is Nn. Now each of the Nn possible samples is equally
1
likely. Hence, the probability of selection of any one sample is . Theoretically,
Nn
an infinitely large number of samples can be drawn from the population if
sampling is with replacement as the population is not exhausted.
A sample is said to be without replacement if the process of sampling is such

that the population units selected for the sample are not returned to the
population. As each unit is selected it is withdrawn from the population so that
the size and composition of the population changes after each draw. An item once
selected cannot be repeated in subsequent draws. The probability of any
population unit being selected in the sample is thus not independent of the results
of previous draws. This is particularly the case of sampling without replacement
from a finite population where the population size is not very large compared to
the sample size.

From a population of size N, the first unit can be drawn in N ways, the second
unit in N-1 ways and so on, when sampling is without replacement. Since the
order in which the sample units are selected is not relevant, the total number of
N N!
possible samples of size n from a population of size N is   
 n  n!N  n !
When the population from which the sample is drawn is very large in relation to
sample size, ie, when n < 0.05N, then for all practical purposes we can consider
the population to be infinitely large. If population is infinitely large then number
of samples that can be drawn from the population is also infinitely large,
irrespective of whether sampling is with or without replacement. It is only when
population is finite, sample size n > 0.05N, and sampling is without replacement
N
that the total number of samples will be   and probability of selection of any

n
one of the equally likely samples is 1 .
N
 
n
1.2 Sample Statistic and Sampling Distribution

A sample statistic or estimator is a function of sample values. It is a procedure
by which sample values are combined. The sample statistic is used to estimate
the population parameter. The numerical value of the sample statistic or
estimator of a given sample gives us the estimate of the value of the population
parameter. This numerical value of the sample statistic is called the estimate.
Several different functions of sample values could be used to obtain the estimate
of the parameter value. Examples of estimators for population mean μ are the
~
sample mean X , the trimmed mean X tr , the median X , or some weighted
average of sample values.
Consider selecting two different samples of the same size n (x1, x2,……xn) from
the same population. It is very unlikely that all the sample values of the first
sample will be repeated in the second sample. Since a sample is only a small
subset of the population and a large number of samples of the same size can be
drawn from the same population, the sample values will be likely to differ from
one sample to the other sample.

Before we obtain sample data, there is uncertainty about the value of each Xi,
since each sample element can be any one of the population units. Because of
this uncertainty, each observation is a random variable Xi before the data
becomes available.
Since sample observations are random variables, the value of any function of the
sample observations (eg, sample mean X , sample variance S2, etc) is also a
random variable which varies from sample to sample. There is uncertainty about
the value of X , the value of S2, and so on prior to obtaining the sample
observations. The value of the sample statistic will depend on which sample was
selected and the parameter estimate would differ accordingly.
Hence a statistic or estimator is a random variable and has a probability

distribution. Suppose the process of drawing samples of size n from a given
population could be repeated indefinitely and we calculated the value of the
statistic for each of these samples. We would then generate a distribution of all
possible values of the statistic, ie its probability distribution. This theoretical
probability distribution of a statistic is referred to as its sampling distribution.
The sampling distribution describes how the statistic varies in value across all
possible samples that might be selected from a population with the respective
probabilities. Standard deviation of the sampling distribution is called the
standard error of the statistic.
The sampling distribution depends on the population distribution (eg. normal,

uniform etc), the sample size n and the method of sampling.
2 SAMPLING TECHNIQUES
Samples selected from a population must be representative of the population. If a
sample is unrepresentative of the population and the sample statistic is used to
estimate the corresponding parameter value, then this will result in an inaccurate
estimate. If the selected sample contains a disproportionately large number of
units from one end of the population distribution then the sample statistic will
provide an underestimate or overestimate of the parameter value. For example, if
the sample observations are atypically larger than most of the population values,
then the sample mean will be an overestimate of the population mean.
The technique used in the collection of sample data should be such that it
minimizes the possibility of such errors. There exist a number of alternative

sampling methods for obtaining representative samples. The choice of sampling

technique adopted for a particular study will depend on the purpose of the study
and the kind of population being sampled.
2.1 Simple Random Sampling

Simple random sampling method is often used to select samples from the
population. The popularity of this sampling method is due to the fact that
sampling distribution of any statistic can be more easily obtained for simple
random samples than for samples obtained by using any other sampling method.
Sample units are selected by lottery method or by using random number tables.
Simple random sampling ensures that each unit of the population gets an equal
chance of being selected in the sample. Since several different samples can be
selected from any population, this method ensures that each sample of the same
size has the same probability of being selected. This method is useful for
homogeneous populations where there are no extreme values. In case of
homogeneous populations atypical observations are unlikely in the selected
sample and the estimate is unlikely to be biased.
The random variables X1, X2,…….Xn are said to form a simple random sample of
size n if the following two conditions are satisfied:
1. The Xi’s are independent random variables
2. Every Xi has the same probability distribution.
In a simple random sample (SRS), each unit of the sample is then said to be
independently and identically distributed (iid).
If sampling is with replacement from a finite population or from an infinitely large

population (with or without replacement) then these conditions are satisfied
exactly. If sampling is without replacement from a finite population then these
conditions will be approximately satisfied provided the sample is small relative to
the population size. In practice, we can proceed as if Xi’s form a random sample if
at most 5% of the population is sampled.
2.2 Stratified Random Sampling

Stratified random sampling is suitable when the population is heterogeneous but
can be subdivided into homogeneous subgroups or strata. Random sampling
method is used to select units from each subgroup or stratum in such a way that
pattern of distribution of units from different subgroups in the sample is the same

as that found in the population. The proportions in the sample from each
subgroup conform to the proportions in the population. However, more
information is required about the population in this method than in SRS.
2.3 Two- Stage or Multi-Stage Random Sampling

This method is useful when the population is infinitely large. For example, this
method is very useful in crop-cutting experiments where the objective is to
estimate the average yield of crops. All the districts where the crop is grown are
listed and some districts are selected at random. Then some villages from the
selected districts are chosen at random. From the selected villages some fields or
plots are selected at random. The average yield of the crop in these fields is then
examined. In this example the experiment is conducted in three stages (district
level, village level, and field level). At each stage random sampling method is
used to go to the next stage. So the study involves three stage random sampling.
2.4 Systematic Random Sampling

This is another example of how random sampling method is used in a specific
manner. The population is first listed or ordered in a random fashion. If it is
decided to select every kth unit from the population then the first sample unit is
selected at random from the first k units of the population. Then every k th unit
after that is chosen in a systematic manner. Thus if the first unit is the jth
population unit, the second unit will be the (j+k) th unit, followed by every kth unit
after that. Hence the first unit is selected at random from a randomly arranged
population, followed by systematically selecting every kth unit after that. The
method requires that that there is no pattern in the initial ordering of the
population to prevent bias in the selection of sample units.
This method is often used in estimating the timber available in a forest. A tree is
selected at random and then a direction is selected at random. Every i th tree in
the selected direction, starting from the first tree, is then examined.
2.5 Purposive Sampling

The population is first restricted according to the purpose of the study. The
sample units are then drawn using simple random sampling from this specified
subset of the population. For example, if the purpose of the study is to study the
effects of inflation on the nutrition levels of the poor, then we restrict the
population to only those families which can be classified as poor. Then a random
sample of poor families is selected from this subset of the population.

2.6 Cluster Sampling

The population is divided into clusters or groups. A random sample of clusters is
then chosen. All the observations in the selected clusters are included in the
sample. If the purpose of study is to examine crop yield, the villages may be each
considered to be a cluster of farms. A random sample of these clusters is
selected. All the farms in the selected clusters make up the sample units. The
crop yield of all the farms belonging to the selected clusters is then studied. This
method may however lead to biased results if an unusually large (or small)
proportion of farms in the selected sample use correct fertilizers and irrigation to
increase crop yields.
We see from the listed sampling techniques that there is no unique method for
obtaining a representative sample from a given population. Other sampling
methods exist which combine features of more than one of the above methods.
One such example is Stratified Cluster Sampling. The method adopted when
selecting a sample will depend on the nature of the population, purpose of study,
along with time and expenditure constraints.
3 SAMPLING AND NON-SAMPLING ERRORS

When there is a population census all population units are considered in
calculation of the parameter value. In a sample survey. since only a small subset
of the population is studied, the value of the sample statistic used to infer about
the parameter may differ from the value obtained from census data. The results
will also vary from sample to sample even though the samples are all of the same
size and are highly representative of the population. The difference between the
true parameter value and the value of sample statistic is called the sampling
error. Variation of the value of the statistic from different samples is known as
sampling fluctuation.
It is a matter of pure chance which sample is selected. Hence sampling errors are
due to chance factors. Sampling errors are observed only in a sample survey. It is
completely absent in the census method. Factors which contribute to sampling
errors are:
1. Heterogeneity or variability of the population.
2. Bias in the estimation method if incorrect formula used for the statistic
3. Sometimes, in a properly selected sample, some of the sample units
cannot be observed and these are substituted by other units and the

characteristics of the substituted units differ from the originally selected

sample units.
4. Failure to observe the data from all the selected sample units, particularly
seen when there are instances of non-response.
Non-sampling errors can occur at any stage of the planning, collection,

processing and analysis of data. These errors can be present in both census and
sample studies. Some of the factors that give rise to non-sampling errors are:
1. Faulty definition of the population or sample units
2. Personal bias of the investigator
3. Inadequately trained investigators
4. Improper coverage so that units that should have been covered are
excluded. Or, if units are included that should have been excluded.
5. Errors in compilation of data when wrong entries are made or due to
wrong calculations.
Non-sampling errors can be reduced with proper planning, training and care at
every stage of the process from collection of data to final calculations.
4 DERIVING A SAMPLING DISTRIBUTION

Suppose random samples of size 2 are selected with replacement from the
following population: 2, 6, 10
The population mean and variance are µ = 6 and σ2 = 10.667
The total number of likely samples is 32 = 9. Following is the list of samples with
 x 
n n
x
2
i i x
their respective means and variances, where x= i 1
and s2  i 1
n n 1
(x1,x2): (2,2) (2,6) (2,10) (6,2) (6,6) (6,10) (10,2) (10,6) (10,10)
x: 2 4 6 4 6 8 6 8 10
s2 : 0 8 32 8 0 8 32 8 0
Sampling distribution of x :
x : 2 4 6 8 10
P( x ) : 1/9 2/9 3/9 2/9 1/9
X = 2(1/9) + 4(2/9) + 6(3/9) + 8(2/9) + 10(1/9) =(2+8+18+16+10)/9 = 6

 1 3 2 1 
Var( x ) = 4   16   36   64   100   62 
2 372 10.667
   36  5.33 
 9 9 9 9  9  9 2
Thus  X = µ and  x2 = σ2/n
We can similarly derive the distribution of sample variances and obtain the mean
and variance of the sampling distribution.
2
Sampling distribution of s :
s2 : 0 8 32
2
P( s ) : 3/9 4/9 2/9
3 4 2 96

E( s )= 0   8   32    10.667 , so that E( s )=σ2
2 2
9 9 9 9
  3  2   96 
2
4
V( s )= 0   64   1024   
2304 9216
    142.22
2
 9 9  9   9  9 81
Above example illustrates how sampling distribution of a statistic is obtained by

first listing all the equally likely probable samples. We can, however, obtain the
sampling distribution of a statistic by analytical methods, without actually drawing
all possible samples of the same size from the population. In our example the
population was very small and sample size large relative to the population so that
the number of possible samples was limited. In practice this would not be the
case, and deriving the sampling distribution by repeated sampling would be
almost impossible.
5 ANALYTICAL METHODS FOR DERIVING SAMPLING DISTRIBUTIONS

The sampling distribution can be obtained either with the help of calculations
based on probability rules or by carrying out a simulation experiment.
5.1 Using Probability Rules

Probability rules can be used to obtain the distribution of a statistic provided that
it is a “fairly simple” function of the Xis and either there are relatively few
different X values in the population or else the population distribution has a “nice”
form.
If the form of the population distribution is known then the probability of selection
of a sample unit will be the probability of its occurrence in the population. Using
this information, the probability of selection of a particular sample is the joint

probability of all the sample units. If the sample units are assumed to be
independent, then the probability of selection of a particular sample is the product
of the probabilities of the sample units. This follows from the assumption that the
sample units are independent. The probability associated with a particular sample
is also the probability of its mean and its variance. The probability of a sample
mean is the same as the probability of selecting the sample for which the mean is
computed. Similarly, probability of a sample variance is equal to the probability of
selecting the sample for which the variance is computed.
In the above example probability of selection of any one of the 3 populations

units (2, 6 or 10) is 1/3 as each unit is equally likely. Provided the sample unit is
replaced in the population before the second unit is drawn, the population
remains unchanged, and the probability of selection of the second sample unit
remains unchanged at 1/3. Then the probability of selection of any sample of size
2 from the population is the joint probability (1/3 )( 1/3 ) = 1/9. This can be
verified from the fact that total number of possible samples is Nn = 32= 9 so that
probability of selection of any one of the equally likely samples is 1/9.
The value of mean and variance of each of the 9 samples is equally likely with
probability 1/9. Now we have three samples (2,10), (6,6), and (10,2) which have
same mean value 6. The probability of obtaining a mean value of 6 is the same as
the probability of selecting any one of the three samples: (2,10) or (6,6) or
(10,2). The sum of the probabilities is 3/9, which is the probability of x =6.
Thus the sampling distribution of means and variances can be derived from the
probability of selecting a sample that results in specific values of sample mean
and variance.
5.2 Simulation Experiments

Simulation experiments are performed to obtain the sampling distribution of a
statistic usually when derivation via probability rules is too difficult or complicated
to be carried out.
The following characteristics of an experiment must be specified at the start of

the experiment:
1. The statistic of interest. ( x , s, x tr etc)
2. The population distribution and its parameters. [N(μ,σ2); U(α,β) etc]
3. The sample size (n)

4. The number of replications or trials of the experiment (k)
Then use a computer to obtain k different random samples, each of size n, from
the population distribution.
For each such sample, calculate the value of the statistic. From k replications we
get k samples and k calculated values of the statistic. Now construct a histogram
for the k values. The histogram gives the approximate sampling distribution of
the statistic.
The larger the number of replications (k), the better will be the approximation of
the sampling distribution. In practice, k=500 or 1000 is usually enough. Actual
sampling distribution is obtained when k→∞.
6 DISTRIBUTION OF THE SAMPLE MEAN
The sample mean X is useful as it can be used to draw conclusions about the
population mean μ. Some of the most frequently used inferential procedures are
based on the properties of the sampling distribution of X .
Let X1, X2,….Xn be a random sample from a population with mean μ and standard
deviation σ. Since this is a random sample, the Xi’s are independently and
identically distributed.
Since the sample units are drawn at random from the same parent population
distribution, with mean μ and variance σ2, each observation Xi is independently
and identically distributed with mean μ and variance σ2.
Since any unit of the population could have been selected and any unit could take
any of the population values with respective probabilities, hence E(Xi) = µ and
V(Xi) = σ2. This is because any population unit could be selected in the sample.
So every sample unit could be any one of the population values. Given the
population has a distribution with different populations values associated with
some probability, every sample unit has the same probability distribution as the
population.
n
X = X
i 1
i n is a linear combination of n independent random variables Xi, each
having equal weight of 1/n. Then,

n n
E( X ) = E   Xi n  = 1  E  X i  = 1
. n.µ = µ  =µ
i 1 n i 1 n X
And,
n n
2 
V( X ) = V   Xi n = 1
 V  X i  =
1
.n. σ2
=  =
i 1 n2 i 1 n2 n X
n
Thus, the mean of sampling distribution is independent of sample size but the
standard deviation of X (or the standard error of the mean),  , is inversely

X
related to the sample size .
Fig 1 Population distribution and distribution of sample means
As the sample size increases we obtain more information from the sample and we
can expect the value of sample mean to be closer to the population mean value.
As the sample size increases the sampling distribution becomes narrower and the
sample means become clustered closer to μ. In the limit as sample size increases
indefinitely the sampling distribution collapses to a single point. Each and every

sample mean will be equal to the population mean, ie, X = → 0 as n →∞
n
As long as n>1,    X . The reason being that for each sample, the sample
X
mean ( X ) averages out the variability of the observations within the sample.
The sample mean is a central value for each sample. Although the value of the
sample mean is affected by all sample values, by its very nature the mean value
must lie somewhere in the middle of the range of sample values. This is true for
each and every possible sample drawn from the population. Thus the variability in
values of sample means must be less than the variability in the population values

where all the population units have to be considered in calculation of population

variance.
Exercise 1 For random samples from an infinite population, what happens to

the standard error of the mean if sample size is
(a) increased from 25 to 100;
(b) decreased from 400 to 100?

Solution Since X = increasing n reduces standard error and
n
decreasing n increases the standard error
   
(a) = and = . Therefore  is halved
25 5 100 10 X
when sample size is four times the previous size

   
(b) = and = . Therefore  is doubled
400 20 100 10 X
when sample size is one-fourth the previous size
7 DISTRIBUTION OF SAMPLE MEANS WHEN POPULATION IS

NORMALLY DISTRIBUTED
Let the population be infinitely large or it is finite and sampling is with
replacement. Suppose the population is normally distributed with mean μ and
variance σ2. Then each of the observations Xi of the random sample of size n will
also be normally distributed with mean μ and variance σ2. The sample mean X is
a linear combination of n independent, normal variables, each having an equal
1
weight of , so that X also has a normal distribution with mean µ and variance
n
σ2/n.
Since X =  n , the standard deviation of the sampling distribution of means
is inversely related to sample size n. As n increases,  X decreases. For larger
samples, the distribution of sample means will be clustered closer to µ. For

smaller sample sizes the sampling distribution is more spread out. See figure 2.

Fig 2 Distributions of sample means where n1< n2 < n3

If sampling is without replacement from a finite population, then we must apply
the finite population correction factor. The sample means would be normally
2 N n
distributed with mean µ and variance  . However, if the sample
n N 1
N n  2 when
→ 1 and  x →
2
size is very small relative to population size,
N 1 n
n<0.05N
If population is normally distributed then sampling distribution of means

will also be normally distributed, irrespective of sample size, ie, for any n
Exercise 2 Suppose annual percentage salary increases of CEO’s of all midsize

corporations are normally distributed with μ=12.2% and σ=3.6%.
A random sample of 9 observations is obtained from this
population and the sample mean computed. What is the probability
that X will be less than 10%?
Solution Since population is normally distributed the sampling distribution of
X will also be normal with  X = µ=12.2 and  X =3.6/√9 =1.2
 10  12.2 
P( X <10) = P  Z     1.83 = 0.0336
 1.2 
Exercise 3 Marks obtained by 3000 students in an admission test are

normally distributed with μ=68 and σ=3. If 80 random samples of
25 students each are obtained, what would be  X and  X if
sampling is done (a) with replacement? (b) without replacement?
In how many samples would you find X >69?
Solution Whether sampling is with or without replacement,  X = μ=68

3
If sampling (a) with replacement then X   0.6 and
25
3 3000  25
(b) without replacement then X   0.6
25 3000  1
[We obtain similar results for (a) and (b) since n = 0.0083N]
 69  68 
P( X >69)= 1- P( X <69)= 1- P  Z    1  1.67 
 0.6 
= 1-0.9525 = 0.0475
Number of samples with X >69 = (80)(0.0475)=3.8 4
8 CENTRAL LIMIT THEOREM

The Central Limit Theorem (CLT) is one of the most important theorems in
Statistics. It often forms the basis for estimation of population mean and tests of
means. The central limit theorem applies only to the sample mean and to no
other sample statistic.
The central limit theorem states that when an infinite number of successive
random samples of the same size, n, are taken from a given population with
mean μ and variance σ2, the distribution of sample means X will be

approximately normally distributed with mean μ and standard deviation ,
n
provided n is sufficiently large, irrespective of the shape of the population
distribution.
The larger the value of n, better is the approximation. Even when the population
distribution is highly non-normal, averaging of sample values while computing
X produces a distribution more bell-shaped than the population itself. If n is
large, a suitable normal curve will approximate the actual distribution of X . That is
why sampling distribution of X is said to be asymptotically normal. This is
illustrated in figure 2.

Fig 3 The Central Limit Theorem; n1 < n2
The red curve in figure 2 is the positively skewed population distribution. The
green and blue curves depict two distributions of sample means for different
sample sizes where n1 < n2. The distribution of X with sample size n1 is less
skewed than the distribution of the rv X. As sample size is increased suitably to
n2, the distribution of X is approximately normal.
How large must be sample size will depend on how much is the departure of the
shape of the population being sampled from a normal distribution. In many cases
the sampling distribution quickly approaches a normal distribution, as in case the
population has a uniform distribution where sample size of 12 is sufficient. In
some other cases sample size of 60 or more may be required. There is no hard
and fast rule about the sample size required for the sampling distribution of
means to be normally distributed. In practice, quite satisfactory approximations
can be obtained for n > 30, provided N > 2n where N is population size.
The central limit theorem is applicable whether population variable is discrete or

continuous.
CLT plays an important role in estimation and tests of hypotheses about the mean
as the probability distribution of the population being sampled is often not known.
The central limit theorem enables us to use the normal distribution as an
approximation of the distribution of sample means.
9 DISTRIBUTION OF SAMPLE MEANS WHEN POPULATION

DISTRIBUTION IS NON-NORMAL
The Central Limit Theorem extends the results for the normally distributed
population to the general case of non-normal populations. If the population

distribution is not known or if the population distribution is non-normal and

sample size is sufficiently large, the distribution of sample means is
approximately normal.
If population standard deviation σ is known then sampling distribution of means

will be approximately normal with mean μ and variance σ 2/n for n > 30. If σ is
unknown and is estimated by sample standard deviation S, then a larger sample
size is required. The distribution of X will then be approximately normal with
mean μ and variance S2/n where n ≥ 40.
Therefore, if the population being sampled is not known or it is known to be not

normally distributed, the CLT provides a short-cut. When the sample size is large,
we can treat the distribution of X to be approximately normal.
If population distribution is not normal then sampling distribution

of means will be approximately normal if n> 30 when σ2 is known,
and, if n > 40 when σ2 is not known and is estimated by sample S2
Exercise 4 A random sample of size n=100 is taken from an infinite population

with the mean μ=75 and σ2=256. With what probability can we
assert that the value of X will fall between 67 and 83?
Solution The form of the population is unknown but σ is known and n>30.
According to the central limit theorem the sampling distribution of
X will be approximately normal with E( X )=75 and  X = 1.6
Then P(67< X <83)  P  67  75  Z  83  75  = 5   5 =1

 1.6 1.6 
Exercise 5 A random sample of size n=200 is taken from a uniform population

with α=24 and β=48. What is the probability the mean of the
sample will be less than 35?
 
Solution Mean of the continuous uniform distribution is  36 and
2

   2  6.928 . Based on the CLT, X will be approximately
12
normal with E( X )=36 and  =0.49.
X
P( X <35)  P Z  35  36    2.04 =0.0207

 0.49 

10 DISTRIBUTION OF THE SUM AND DIFFERENCE OF SAMPLE MEANS

Let X 1 be the mean of a random sample of size n1 drawn from a population with
mean μ1 and variance  12 where the population is infinite or sampling is with
replacement from a finite population.
Let X 2 be the mean of a random sample of size n2 drawn from a population with
mean μ2 and variance  22 where the population is infinite or sampling is with
replacement from a finite population.
If the samples are independent then we know from rules of mathematical

expectation that
E( X 1 + X 2 ) = E( X 1 )+E( X 2 ) = μ1 + μ2 and E( X 1 + X 2 ) = E( X 1 )+E( X 2 ) = μ1 + μ2
V( X 1 + X 2 ) = V( X 1 )+V( X 2 ) =  1 +  2 and V( X 1 - X 2 ) = V( X 1 )+V( X 2 ) =  1 +  2

2 2 2 2
n1 n2 n1 n2
If the populations are normally distributed then ( X 1 + X 2 ) will also be normally
distributed with mean μ1 + μ2 and standard deviation  12  22 irrespective of the


n1 n2
sample sizes.
If populations are not known to be normal and sample sizes are sufficiently large
(n1 > 30, n2 > 30) then by CLT the distribution of ( X 1 + X 2 ) will be approximately
normal with mean μ1 + μ2 and standard deviation  12  22 .


n1 n2
If population variances  12 and  22 are not known and are estimated by sample
variances S i2 and S 2 respectively, then by CLT, if n1 > 40, n2 > 40, the
2
distribution of ( X 1 + X 2 ) will be approximately normal with mean μ1 + μ2 and
standard deviation S 12 S 22

n1 n 2
If the populations are finite of size N1 and N2, and the samples from the two
populations are drawn without replacement, then the finite population correction
factor must be applied to the variance. If the population variances are known and

the populations are normal then the sampling distribution of ( X 1 + X 2 ) will be
normal with mean μ1 + μ2 and standard deviation  12  N 1  n1   22  N 2  n 2 

   
n1  N 1  1  n 2  N 2  1 
If the populations are not normal or population variances are unknown and
estimated from sample data, then the sample sizes must be large enough.
If both random samples are independent and from the same population so that
μ1= μ2= μ and  12 =  22 = σ2, then E( X 1 + X 2 ) = 2μ and E( X 1 - X 2 ) = 0. The standard
deviation of the distribution of ( X 1 + X 2 ) is 

1 1 . If population variance is

n1 n2
unknown then it is estimated by a weighted average of sample variances. Thus,

 2
 
n1  1S12  n2  1S 22
, where information available in both samples are
n1  n2  2
used.
Exercise 6 The average lifetimes of Duracell Alkaline AA batteries and

Eveready Energizer Alkaline AA batteries are given as 4.1 hours
and 4.5 hours, respectively. The standard deviations of lifetimes
are 1.8 hours for Duracell batteries and 2.0 hours for Eveready
batteries. Random samples of 100 new batteries of each brand are
selected. What is the probability that the sample mean lifetime of
Duracell exceeds the sample mean lifetime of Eveready batteries?
Solution Let mean lifetime of samples of Duracell batteries be X hours and

mean lifetime of samples of Eveready batteries be Y hours.
 X   X  4.1 hours and Y  Y  4.5 hours   X  Y  0.4 hours
 X2 
1.82  0.0324 &  Y2 
22  0.04   X Y  0.0724  0.269
100 100
Since sample sizes are large, by CLT, the distribution of ( X - Y ) is
approximately normal with mean  X Y   X  Y  0.4 hours and
 X Y  0.269 hours. Now, P( X > Y )= P[( X - Y )>0] = 1- P[( X - Y )<0].

Therefore, required probability is: 1- P[( X - Y )<0]
 0   0.4
= 1- P Z 
  1  1.487  = 1- 0.9319=0.0681
 0.269 

References:
1. Jay L. Devore, Probability and Statistics for Engineering and the Sciences,
8th edition, Cengage Learning
2. Irwin Miller and Marylees Miller, Mathematical Statistics, Seventh Edition,
Pearson.
3. A.L.Nagar and R.K.Das, Basic Statistics, Second Edition, Oxford University
Press
PRACTICE QUESTIONS
1. Given the population: 27 32 17 21 32

(a) Calculate μ and σ
(b) List the random samples of size n=2
(c) Derive the sampling distributions of mean and variance
(d) Compare the mean and variance of the sampling distributions with
the parameter values.
2. For the population given in question 1,

(i) Repeat all the parts (b) to (d) for n=3
(ii) How do the results vary from the ones obtained for n=2? Why?
3. State and explain the significance of the Central Limit Theorem
4. What properties of the SRS help in deriving the sampling distribution of means?
OR
Why are random samples commonly used to obtain estimates of unknown
parameters?
5. If X1, X2, X3 is a simple random sample of size three from a large

population with mean 5 and 4, evaluate the expected value and std error
of the statistic T = (2X1 + X2 – 3X3)
6. Distinguish between sampling with replacement and sampling without

replacement and explain the effect of method chosen on estimation of
population mean in each case
7. Let random samples of size 25 be drawn from a population that is

normally distributed with mean 2800 and standard deviation 5000. What is

the mean and standard deviation of the distribution of sample means?

How would your answers be affected if sample size was 100?
8. An equipment manufacturer requires steel bars which must have a mean

length of at least 50 inches. The company can purchase bars from supplier
A whose bars have an average length of 47 inches with a standard
deviation of 12 inches, or from supplier B whose bars average 49 inches
with a standard deviation of 3.6 inches. If the manufacturer plans to buy
81 steel bars, which supplier, A or B, should be chosen?
9. The standard deviation of the amount of time it takes a college student to

complete a project is 40 minutes. A random of sample of 64 students is
taken. What is the probability the sample mean will be more than 8
minutes less than the population mean?
10. A random sample of size 81 is taken from an infinite population with the
mean  = 128 and the standard deviation σ = 6.3. With what probability
can we assert that the value we obtain for the sample mean X will not fall
between 126.6 and 129.4?
11. The mean production level at a firm is assumed to be 47.3 units per day
with a standard deviation of 12.7. The manager takes a sample of output
for 25 days. If the sample mean exceeds 49 units then the workers are
promised a Diwali bonus. How likely are the employees to get the bonus?
What assumption did you make, if any?
12. Independent random samples of size 400 are taken from each of two
populations having equal means and std deviations σ1 = 20 and σ2 = 30.
What can we assert with a probability of 0.99 about the value of the
difference in sample means?
13. What does standard error of a statistic measure? If a random sample of

size 25 is taken from a population distributed normally with mean 500 and
standard deviation 35, estimate the standard error of the sample mean.
Would you prefer a sample of size 49? Explain.

14. Given that test scores in an entrance examination are normally distributed
with mean of 30 and a std. dev. of 6
(i) What is the probability that a single score drawn at random will be
greater than 34?
(ii) What is the probability that a sample of 9 scores will have a mean
greater than 34?
(iii) Explain the difference in the results obtained in parts (i) and (ii)

Some Special Distributions
Statistical Methods In Economics-II

Lesson: Some Special Distributions

College/Department: Department Of Economics,
Dyal Singh College(M), University of Delhi

Table of Contents
1. Uses of Sampling Distributions in Statistical Analysis 2

1.1 Estimation 3
1.2 Hypothesis Testing 4
2. Other Sampling Distributions 4
3. Review of the Normal Distribution 5

3.1 The Standard Normal Distribution 5
3.2 Normal Approximation of the Binomial Distribution 7
4. The Student’s t Distribution 9
5. The Chi-Squared Distribution 14
6. The F Distribution 18

Reference:
Jay L. Devore, Probability and Statistics for Engineering and the Sciences, 8th
edition, Cengage Learning

SOME SPECIAL DISTRIBUTIONS
The chapter begins with a brief introduction to the uses of sampling distributions
in problems of statistical inference. Some special distributions you will need for
statistical inferences will then be discussed. You will also be taught how to use
the tables given for each of the distributions. The chapter is followed by practice
questions so that you can test your understanding of the distributions discussed
in the chapter.
Chapter Outline
1. Uses of Sampling Distributions in Statistical Analysis

2. Other Sampling Distributions
3. Review of the Normal Distribution
4. The Student’s t Distribution
5. The Chi-Squared Distribution
6. The F Distribution
1 USES OF SAMPLING DISTRIBUTIONS IN STATISTICAL INFERENCE

Problems of statistical inference are decision problems. These are divided into 2
categories: (1) Estimation, and (2) Tests of Hypotheses.
The main difference between the 2 kinds of problem is that:
(1) In problems of estimation we must decide the value of an unknown

parameter from possible alternatives.
(2) In hypothesis testing we must decide whether to reject or not to reject a

specific value of a parameter.
When estimating the parameter value θ we may use the value of the statistic as a
point estimate. Or, in case of interval estimation, we can give a range of values of
the statistic within which θ is expected to lie with a specified probability.

In hypothesis testing we assume a value of the parameter and use sample

statistic to decide whether to reject that value or not to reject that value.
1.1 ESTIMATION
Let the unknown parameter be the population mean µ. If the population is
normally distributed then there are several different possible estimators eg,
~
X , X , X tr , semi-interquartile range etc. Knowledge of sampling distribution of
alternative estimators helps us choose the best estimator.
In making this choice we are guided by some specified desirable properties of an

estimator. One such property is that of unbiasednes which requires that the
mean of the sampling distribution equals the true parameter value. This property
is not dependent on any particular value of the parameter. Therefore, X is an
unbiased estimator of µ since E( X ) = µ
~
If population is normally distributed, then X , X tr , will also be unbiased
estimators. With more than one unbiased estimator we need additional criterion
to select the best estimator from among the unbiased estimators. We compare
the variance of the sampling distributions of the different unbiased estimators and
choose the one with the minimum variance. It can be shown that X is the
minimum variance unbiased estimator (MVUE) of µ
The precision of the point estimate is measured by the standard deviation of the
sampling distribution of the estimator, referred to also as the standard error of
the estimator.
The point estimate does not indicate how close it might be to the true parameter
value. An alternative to point estimation is interval estimation. A confidence
interval is always calculated by first selecting a confidence level, which is a
measure of the degree of reliability of the interval. Most frequently used
confidence levels are 95%, 99% and 90%.
A confidence level of 95% implies that 95% of all samples would give an interval
that includes the parameter value and only 5% of all samples would yield a wrong
interval. Higher the confidence level, the more strongly we believe that the value
of the parameter being estimated lies within the interval. Precision of the interval
estimate is indicated by the width of the interval. Smaller the width, greater is

the precision. A narrow width combined with a high confidence level means that
the estimate of the parameter value is reasonably precise.
Therefore, for point estimation and interval estimation, knowledge of the

sampling distribution of the statistic is required. We need to know the mean and
variance of the sampling distribution.
1.2 Hypothesis Testing

The other application of sampling distributions in statistical inference is in
problems of hypothesis testing. A statistical hypothesis is a claim or assertion
about the value of a single parameter (eg µ, σ etc) , or, about values of several
parameters (eg µ1- µ2, σ12/ σ22 etc), or, about the form of an entire probability
distribution (eg Normal, Uniform etc)
One example of a hypothesis is the claim that µ= µ0 where µ0 is a specified value

of µ or the null value. The alternative hypothesis may be µ  = µ0, or µ> µ0, or
µ< µ0. The null hypothesis that µ= µ0 is rejected in favour of the alternative
hypothesis if sample evidence contradicts the null hypothesis and provides strong
support for the alternative hypothesis.
A test of hypothesis is a method for using sample data to decide whether the null
hypothesis should be rejected or not. We select a test statistic on which the
decision to reject or not reject Ho is to be based. Then we set up the rejection
region or critical region, which is the set of all test statistic values for which the
null hypothesis will be rejected.
Knowledge of the sampling distribution of the statistic is required to set up the

rejection region. It is also required for calculation of the probability of two kinds
of errors that may occur in tests of hypotheses. Type I error occurs when a null
hypothesis is true but is rejected. Type II error occurs when a null hypothesis is
false but we fail to reject it. Decision about the appropriate rejection region will
depend on which type of error is the more serious error and should be minimized.
Calculation of the probability of either kind of error requires knowledge of the
sampling distribution.

2 OTHER SAMPLING DISTRIBUTIONS

In the previous chapter we have studied the sampling distribution of means and
of the sum and difference of means. The sampling distribution was seen to be
normal if the parent population from which samples are selected is normally
distributed. When the population is not normal or not known, then with the help
of central limit theorem, we could proceed with the assumption that the sampling
distribution is approximately normal if sample size is sufficiently large.
The normal approximation does not apply if population is non-normal and sample
size is small. In this case the sampling distribution follows the t-distribution.
Some of the other statistics of interest may not have normal distributions. The
sample proportion will have a normal distribution for large sample sizes but a
binomial distribution for small samples. Variance and standard deviation will be
described by the chi-square distribution. The F-distribution is useful in making
inferences about variances of two populations. You will learn about the different
distributions in this chapter. You will also learn how to use the tables given for
each of the distributions.
3 REVIEW OF THE NORMAL DISTRIBUTION

The most commonly used distribution in statistical analysis is the normal
distribution. A continuous rv X has a normal distribution if its pdf is
1  x  2
1   
f ( x)  f ( x;  ,  )  e 2  
where      , 0   ,    x  
 2
The parameters of the normal distribution are its mean μ and variance σ 2. The
normal distribution is thus a family of distributions, each for a different
combination of the parameter values μ and σ. The normal curve is symmetric
about the mean and bell-shaped. Since the distribution depends on values of μ
and σ, we would require an infinite number of separate tables of probability
densities for each possible combination of the parameter values. If the rv X has a
normal distribution we often denote this as X~N(μ, σ2).
3.1 The Standard Normal Distribution

X 
Standardising the normal rv X we obtain the standard normal variable Z  .

The pdf of the standard normal distribution is
z2
1 
f ( z )  f ( z;0,1)  e 2
where   0,   1,    z  
2

We denote the distribution as Z~N(0, 1). This is a very important distribution as

any non-standard normal distribution can be transformed to the standard normal
distribution. From the standard normal distribution, given μ and σ, we can obtain
the percentiles of the non-standard distribution. Since we can use f(z) to make
inferences about any non-standard normal distribution, it is very convenient as
we need the tabulated values for a single distribution instead of those for an
infinite number of non-standard distributions.
Table A.3 in Devore gives the standard normal curve areas for z values upto the
second decimal place. The table gives cumulative probabilities for z values in the
z
range of -3.4 and 3.4, denoted by z  . Thus,  z    f z  dz , shown as shaded

area in figure 1. If z<0, z  < 0.5000; if z>0, z  > 0.5000.
Fig 1 The standard normal distribution: z ~N(0,1)
If z1< z2 , P(z < z1) = z1  ; P(z2 < z)= 1- z 2  ; P(z1 < z < z2) = z 2  - z1  .
[Note that P(z1 < z < z2) = P(z1 < z < z2) since z is a continuous variable.] So, if
 x1   x2   
P(z1 < z < z2) = z 2  - z1  .
x1<x2, P(x1 < X < x2) = P  Z   =
   
If x1<μ, then z1< 0. If both x1and x2 are less than μ, then z1< 0 and z2< 0, but
z1< z2.
Exercise 1 The average monthly production costs for a printing facility is $410,
with a standard deviation of $87. The manager promises the owner
that he will keep costs below $300 the next month. If costs are
normally distributed, what is the probability the manager will keep
his promise?
Solution Let X= monthly production costs ($) where X~N(410, 872).
 300  410 
P(X < 300) = P  Z     1.26  0.1038
 87 

We can calculate the x-value from a known probability by using the z-distribution.
In other words, we can obtain the percentiles of the non-standard normal
distribution, with known parameter values (μ, σ2), from the percentiles of the z-
distribution. Let P(X < x) = α, shown as shaded area in figure 2 (a).
Fig 2(a) Percentiles for areas under the normal curve
Fig 2(b) Percentiles for areas under the standard normal curve
From the body of the table of z  we can locate α (See Devore table A.3). The
intersection of the row and column of the tabulated z-distribution at α gives us
the corresponding z value for which P(Z < z) = α, the shaded area in figure 2(b).
X 
Substituting this z-value in the formula Z , we obtain the 100αth

percentile of X as x = μ + zσ. If α < 0.5, then z < 0 so that x < μ. Similarly, if
α > 0.5, then z > 0 and x > μ.
When z > 0, x = μ + zσ may be interpreted as x = z times the standard deviation

units above the mean. Similarly, x = μ - zσ may be interpreted as x = z times the
standard deviation units below the mean when z < 0.
Exercise 2 Storage units at a countrywide warehouse chain have an average

area of 83.2 sq ft. with a standard deviation 53.7 sq ft. If storage
unit areas are normally distributed, what must be the area of a
storage unit for it to be larger than 90% of all units?
Solution Let X= area of a storage unit (sq ft) where X~N(83.2, 53.72).

P(X < x) = P(Z < z) = z  = 0.9  z = 1.28

 x = 83.2 + (1.28)(53.7) = 151.936 sq ft
Exercise 3 In above execise 2, what must be the area of a storage unit for it
to be smaller than 75 percent of all units?
Solution P(X > x) = P(Z > z) = 0.75  z  = 0.25  z = -0.675

 x = 83.2 - (0.675)(53.7) = 46.95 sq ft
3.2 Normal Approximation of the Binomial Distribution

The binomial distribution is the appropriate distribution for sample ratios when
sample size is small. The binomial rv is a discrete variable. However, for large
samples, the binomial distribution can be approximated by the normal
distribution, provided the binomial distribution is not too skewed. When
approximating the binomial distribution with a normal distribution, the continuity
correction factor must be applied.
Suppose the random variable X has a binomial distribution with parameters

n=size of the sample or population and p=probability of success. Mean of the
distribution is μ=np and variance is σ2=np(1-p). The general rule of thumb is that
if np > 10 and np(1-p) > 10, then the binomial distribution can be approximated
by the normal distribution with μ=np and variance is σ 2=npq, where q = 1-p .
We can then obtain,

P(X < x) = B(x; n, p)  [area under the normal curve to the left of x+0.5]
 x  0.5  np 
=  ;
 
 npq 
where 0.5 in the numerator is the continuity correction factor.
Suppose x1 < x2, x-1 = x1-1, and x-2 = x2-1. Then,
 x 2  0.5  np   x 1  0.5  np 
P(x1 < X < x2) = B(x2; n, p) - B(x-1; n, p)    -  
   
 npq   npq 
 x 2  0.5  np   x1  0.5  np 
P(x1 < X < x2) = B(x2; n, p) - B(x1; n, p)    -  
   
 x  2  0.5  np   x1  0.5  np 
P(x1 < X < x2) = B(x-2; n, p) - B(x1; n, p)    -  
   

Calculation of probability of the range of values x1 to x2 will depend on whether

the end points of the interval are included in the interval or excluded.
Exercise 4 Records show that 45 percent of all automobiles produced by Ford

Motor Company contain parts imported from Japan. What is the
probability that out of the next 200 cars, 115 contain Japanese
parts?
Solution Let X= number of cars with Japanese parts. Now X is a discrete
variable with a binomial distribution. n=200, p=0.45  q=0.55
Since np = 90 >10 and nq = 110 > 10, therefore we can use the
normal approximation, where μ = np = 90 and σ2 = npq = 49.5
P(X=115) = P(114 < X < 115) = B(115; n, p) - B(114; n, p)
 115.5  90   114.5  90 
   -  
 49.5   49.5 
=  (3.62) -  (3.48)  1 – 0.9997 = 0.0003
4 THE STUDENT’S T DISTRIBUTION

If random samples are drawn from a normally distributed population and σ2 is
known, or if samples are drawn from a non-normal population with known σ2
provided n > 30, or if samples of size n> 40 are from a non-normal population
with unknown σ2, then the distribution of sample means will be characterized by a
normal distribution. The standardized variable,
X  X
Z will have a standard normal distribution with mean 0 and variance 1
X
If population variance is unknown and has to be estimated by sample variance,

then variability of the standardized variable increases. It will depend on which of
the possible samples got selected and this is a matter of pure chance. The
standard error of the sampling distribution of means will depend on the calculated
value of the sample variance used to estimate the population variance. As this
would vary from sample to sample, the standardized distribution of sample means
will be more spread out than the standard normal distribution.
The variability in the distribution of Z depends only on variability of the

numerator when σ is known. It takes on different values depending on which
particular sample got selected and the computed value X . With an unknown σ,

the variability of the standardized variable now arises from both the numerator
 X  X   X  X 
and the denominator. Thus Var  Z   > Var  T   and we have a
  / n   S/ n 
  
family of probability distributions called t distributions.
If sample size is large (n > 40) then the variability is reduced since standard
error of X is inversely related to the sample size. Then, the distribution of the
standardized variable can, with the help of CLT, be approximated by the standard
normal distribution, Z.
However for small sample sizes, when X is the mean of a random sample of size
n from a normally distributed population with mean μ, the random variable
X  X
T has a probability distribution called a t distribution with (n-1)
S
n
degrees of freedom.
The parameter of this distribution is the number of degrees of freedom (df)

which we denote by  where  = n-1 for the sampling distribution of means.
There is a different distribution for each df, where  can take any positive integer
value 1, 2, 3,…… The t-distribution is a family of distributions, each with its own
variance which depends on the degrees of freedom.
The degrees of freedom is defined as the number of observations in the sample

minus the number of restrictions on the observations. When the sample is from a
normal population with unknown mean and variance, we lose 1 df in calculating
T. The sample mean is the restriction in this case. The sample mean is used in
place of unknown μ to calculate the variance of the t-distribution. Thus only n-1
observations can be freely chosen. Once n-1 sample units are observed the nth
observation is automatically determined for a given X
We will let t denote the t distribution with  degrees of freedom. Figure 3

depicts a t distribution and a z distribution for comparison.

Fig 3 The standard normal distribution (z) and the t distribution with  df
Properties of the t-Distribution

1. Each t curve is bell-shaped and centered at 0.
2. Each t curve is more spread out than the z-curve.
3. As df ( ) increases, the distribution becomes narrower.
4. As  → ∞, the sequence of t curves approaches the z-curve.
Figure 4 illustrates a sequence of t curves where  1   2   3 . It shows how
the distribution becomes narrower with increase in df (and sample size)
Fig 4 Sequence of t curves where  1   2   3
Like the Z-distribution, the t-distribution has a mean of zero, is symmetric about
the mean and ranges from -∞ to +∞. Whereas variance of Z-distribution σ2 = 1,
n 1 
the variance of t-distribution σ2 =   1 . Therefore it is flatter and more
n3  2
spread out than the Z-distribution. As n increases the variance of t-distribution
decreases. As n → ∞,  → ∞, and σ2 → 1, and the t-distribution approaches the
Z-distribution. Hence the z-distribution is the limit of the t-distribution.

In statistical inference, the t-distribution is used when the following

conditions are satisfied:
1. The sample is small (n < 40)
2. Population σ is unknown
3. The population is normally distributed or approximately normal.
If the sample is small, population σ is unknown and population is not normal then
the t distribution cannot be used for making inferences about population mean.
The only way out is to increase the sample size by taking sufficiently large
samples to generate an approximately normal sampling distribution. This would
enable the use of the z statistic for making inferences.
Since there are an infinite number of t curves for infinite number of df, it would
require a computer programme to find the required probability under a specific
curve. Table A.5 in Devore gives the t values corresponding to areas under the
t-curve in the upper tail of the distribution for selected probabilities, in
combination with df from 1 to 30 and then  = 32, 34, 36, 40, 50, 60, 120, ∞.
Each row of the table corresponds to values of the t statistic for the given df
and specified upper tail probabilities. Thus each row is for a different member of
the family of t distributions. Since the distribution is symmetric and centered at
zero, the t values will have a negative sign for same areas in the lower tail of
the distribution for specified df.
The symbol tα, will denote the t-value for which the area under the t-curve to
the right of tα, is α, where degrees of freedom is  , as illustrated in figure 5.

Fig 5 P t  t ,    and P t  t ,   1  
Exercise 5 A random sample of 16 factory workers is selected from a

population with hourly wages known to be normally distributed.
If the population mean is Rs.174.93 per hour and the sample
standard deviation=Rs.96, what is the probability that the sample
mean is not less than Rs.217 per hour?
Solution We know that μ=175.07 and s=96. Since the population is normal,
the unknown population standard deviation is estimated by sample
standard deviation, and sample size is small (n=16), the
distribution of sample means will have a t distribution with 15 df.
t15 = 217  174.93  1.753

96 16
P( X > 217) = P(t15 > 1.753) = 0.05.
Exercise 6 A coffee dispenser is calibrated to serve 200ml per cup on the

average. However, due to defects in the machine actual amount
dispensed is variable and has a normal distribution. If a random
sample of 9 cups are examined it is found that the sample standard
deviation is 12ml. What is the mean fill x of the 9 cups if the
probability of obtaining a sample of same size with mean fill less
than x is only 0.01?
Solution Let X= amount of coffee filled in a cup (ml). Now X is normally
distributed with μ=200ml. Given that n=9 and s= 12ml. Since
population is normal, n is small and σ unknown, the distribution of
X has a t distribution with 8 df.

Now P( X < x ) =0.10 and P(t8< -1.397)=0.10.
[Since P(t8>1.397)=0.10 and the t-curve is symmetric about
mean=0, therefore, P(t8>1.397) = P(t8< -1.397)=0.10]
Hence,
x  200
 1.397   x  200  (1.397)(4)  194.412 ml
12 9
Table A.8 in Devore gives t-curve tail areas for combinations of t-values and df.
Since t is a continuous random variable, there would be an infinite number of
possible t values. The table is restricted to positive values of t from 0.0 to 4.0,
where the t-values are given till the first decimal place. The property of symmetry
allows us to obtain the probabilities for t = 0.0 to t = -4.0. This is explained in the

following figure
Fig 6 Symmetry of the t distribution: P t  t ,   P t  t ,   
If area α is in the lower tail of the distribution we obtain - tα, = α(100)th

percentile. Similarly tα, = (1-α)(100)th percentile of the distribution with df = 
Exercise 7 The mean of a normal distribution is 42. What is the probability of

obtaining a random sample of size 25 from this population with a
mean value x > 47 and standard deviation s=7?
47  42
Solution t  3.571  3.6
7 25
P( x > 47) = P(t > 3.6) where  = 24
From Table A.8, P(t24 > 3.6) = 0.001
Exercise 8 The mean of a normal distribution is μ = 28.5. What is the

probability of obtaining a random sample of size 12 with a mean
value x < 27.8 and variance s2 = 3.24?
27.8  28.5 0.7

Solution t   1.347  1.3
3.24 12 0.5196
P( x < 27.8) = P(t11 < -1.3)

From Table A.8, P(t11 < -1.3) = 0.110
CHI-SQUARED (  ) DISTRIBUTION
2
5
The chi-squared distribution is a also continuous distribution like the t distribution

and the normal distribution. Unlike the normal distribution and the t distribution,
the chi-squared (  ) distribution is not a symmetric distribution. Also, unlike the

2
other two distributions, the  2 random variable can only take non-negative

values. Like the non-standard normal distribution and the t distribution, the 2
distribution is an entire family of distributions.
The parameter value which distinguishes one member of the family from another
is the number of degrees of freedom (df) of that particular distribution. Just like
different combinations of values of μ and σ distinguishes one normal distribution
from another, the degrees of freedom  distinguishes one t distribution from
another, and one 2 distribution from another.
The possible values of  are 1, 2, 3,….. and will depend on the number of
restrictions on the observations. The  2 distribution is positively skewed, ie,
skewed to the right. Smaller the df greater is the extent of skewness. As the
value of  increases, the distribution becomes progressively less skewed.
Figure 7 illustrates a sequence of 2 curves where  1   2   3
Fig 7 Sequence of 2 curves where  1   2   3
The mean of the  2 distribution is  and its variance is 2 . As the df increases,
the distribution becomes more spread out and less skewed.
For large  ( > 40), the  2 distribution approaches a normal distribution. So,
as  →∞, 2 → N( ,2 )

If X1, X2, X3,…….X are independent random variables, each having a standard

normal distribution, then the sum of the squares of these variables,  2 =  X i2 ,
i 1
is a chi-squared distribution with  degrees of freedom.
The chi-squared distribution arises in many important applications. It is

particularly useful when testing the form of the parent distribution with sample
information. It is also the basis for inferences about the population variance and
standard deviation, provided the population is normally distributed.
As we have seen, there is a separate 2 curve for each value of  = 1, 2, 3, ……
Using a suitable computer software it is possible to obtain the probability or area
under the density curve 2 for different intervals of the value on the
measurement axis for specified df  . Table A.7 in Devore gives values of the
statistic 2 for  = 1, 2, 3, ……..40, for select areas in the upper tail of the
distribution. Each row of the table corresponds to values of the 2 statistic for the
given df and specified upper tail probabilities. So that, each row is for a different
member of the family of  2 distributions.
The notation  2 , indicates the value of  2 when df =  and area in the upper
tail of the distribution is α. As the value of α increases, the  2 -value decreases
for specified df. This is illustrated in the three panels of figure 8.
Fig 8(a)    
P 2   2 ,   and P 2   2 ,  

Fig 8(b)   
P 2   2 ,   and P 2   2 ,   
Fig 8(c)   
P 2   2 ,   and P 2   2 ,   
Exercise 9 For what value of  2 , will
(a) α = 0.99 when  = 15?

(b) α = 0.01 when  = 15?
Solution From Table A.7,
(a)  02.99,15 = 5.229,
(b)  02.01,15 = 30.577
Exercise 10 What is the probability that  152 < 5.229 or  152 > 30.577?

P(  15 < 5.229) = 1- α = 1- 0.99 = 0.01

2
Solution
P(  15 > 30.577) = α = 0.01

2
 P(  152 < 5.229 or  152 > 30.577) = 0.01 + 0.01 = 0.02
Exercise 11 What is the probability, P(9.591 <  20

2
< 34.170)?
Solution Since  2 is a continuous random variable,

P(9.591 <  20
2
< 34.170) = P(9.591 <  20
2
< 34.170)
= P(9.591 <  20
2
< 34.170) = P(9.591 <  20
2
< 34.170)
From the table, 9.591 =  02.975, 20 and 34.170 =  02.025, 20
Therefore, P(9.591 <  20

2
< 34.170) = 0.975 – 0.025 = 0.95
If a random sample X1, X2,… ,Xn is from a normal population then the sampling
n  1S 2  X 
2
X
distribution of the random variable 
i
has a  2 distribution
 2
 2
with  = n-1 df. This is why the  2 distribution is used for inferences about
population variance and population standard deviation, provided the population is

normally distributed.
6 THE F DISTRIBUTION
The F distribution is useful in inferences about variances of two normal
populations. The F distribution is obtained as a ratio. If X1 and X2 are two
independent random variables having chi-squared distributions with  1 and
X1 1
 2 degrees of freedom respectively, then the ratio F  is a random
X2 2
variable having an F distribution, with  1 and  2 degrees of freedom.
Since the numerator is the ratio of a  2 variable and its df  1 , and both are
positive, the numerator cannot be negative. By similar argument, the

denominator being a ratio of two positive variables, it cannot be negative. Hence
the random variable F can never be negative. The degrees of freedom of the F
distribution are  1 and  2 , where  1 is called the numerator df and  2 is called
the denominator df.

Since the two random variables X1 and X2 are independent, the two 2
distributions are also independent.
Just like the  2 variable has a positively skewed distribution for  < 40, the F
distribution is also skewed to the right. Figure 9 shows the F density curve with
 1 and  2 degrees of freedom.
Fig 9 The F distribution
The notation F ,1 , 2 is used to denote the F-value on the measurement axis, for
the F1 , 2 density curve with  1 and  2 df, such that the area in the upper tail of
the distribution is α, that has a lower bound of F ,1 , 2 . This is indicated by the
shaded area in figure 9.
Table A.9 in Devore gives values of the statistic F1 , 2 for select values of
 1 (numerator df) and  2 (denominator df) and four α-values: 0.100, 0.050,
0.010 and 0.001. For example, F0.1, 4,3 =5.34. This means the area to the right of
5.34 is 0.1 or 10% when numerator df ( 1 ) is 4 and denominator df ( 2 ) is 3.
Since the F-table gives f 1 , 2 values for α in the upper tail of the distribution and
(1-α) in the lower tail, greater the f 1 , 2 value for given df ( 1 and  2 ), smaller
will be α and larger will be (1-α). Similarly, smaller (1-α) and higher α will result
in a value of f 1 , 2 closer to the origin, for specified  1 and  2 .

There are many instances when we require the F ,1 , 2 value on the measurement
axis when α is large, say α = 0.90 so that area in the lower tail of the distribution
is 0.10. This is not available in the given table. Since the F density curve is not
symmetric, it would mean that F1 , 2 values would be required to be tabulated for
both upper and lower tails of the distribution. However this is not necessary as we
can use a property of the F distribution to obtain such F ,1 , 2 values. The fact is
1
that we can use the relation: F1 , 1 , 2  . Note that on the tight hand side
F , 2 , 1
1
of the equation, in ,  2 is the numerator df and  1 is the denominator df.
F , 2 , 1
For example, if α = 0.1 then (1-α) = 0.9. Given  1 = 4 and  2 = 3, then
1 1
F0.9, 4,3    0.24 = 10th percentile of the distribution. This is illustrated
F0.1,3, 4 4.19
in figure 10.
Fig 10 Lower tail area of the F distribution.
Exercise 12 What is P( F12,15 < 2.48)?
Solution Since P( F12,15 > 2.48) = 0.05, therefore for  1 = 12 and  2 = 15
P( F12,15 < 2.48) =1– 0.05 = 0.95
Exercise 13 What is P(2.02 < F12,15 < 5.81)?
Solution P( F12,15 > 2.02) = 0.100, and P( F12,15 > 5.81) = 0.001. Therefore,
P(2.02 < F12,15 < 5.81) = 0.100 – 0.001 = 0.099

Exercise 14 For what value f 5, 4 will P( F5, 4 < f 5, 4 ) = 0.95?
1 1
Solution f 0.95,5, 4    0.19
f 0.05, 4,5 5.19
Reference:
Jay L. Devore, Probability and Statistics for Engineering and the Sciences, 8th
edition, Cengage Learning.
Practice Questions
1. The rv X is normally distributed with μ = 69 and σ = 2.8. Find

(a) P(X > 65); (b) P(62 <X < 72); (c) P(|X -69| > 6);
(d) x if P(X > x) = 0.01 (e) x if P(X < x) = 0.95
2. Let a random sample be selected from a normal population having mean
128. If standard deviation of the sample is 16, what is the probability that
the sample mean will lie between 124 and 132 when the sample size is
(a) n = 9; (b) n = 100
3. Let X be a binomial rv with parameters n =100 and p= 0.2. Approximate

the following probabilities
(a) P(X < 25); (b) P(X > 30); (c) P(15 < X < 22).
4. Twenty percent of batteries produced by a new company are thought to be

defective. In a batch of 1000 batteries, what is the probability that
(a) between 215 and 230 batteries (both inclusive) will be defective?
(b) more than 235 batteries will be defective?
(c) less than 190 batteries will be defective?
(d) less than 185 or more than 220 batteries will be defective?
5. Let the rv X have a t distribution with  df. Find P( X < t ) and  if
(a) t =1.729; (b) t = 2.602; (c) t = 4.785.
6. Find the probability P(t > t ) if
(a) t = 1.5 for  = 10 and  = 20
(b) t = 2.1 for  = 10 and  = 20

(c) Explain the reason for the difference in pairs of results in (a) & (b).
7. (i) What do you understand by the subscripts α,  in the notation  2 , ?

(ii) For what value of  is the probability
P(   , < 4.575) = 0.05? P(   , < 10.085) = 0.10?

2 2
(a) (b)
P(   , > 30.813) = 0.10 P(   , > 23.209) = 0.01?

2 2
(c) (d)
8. Find the probability P(6.262 <  152 < 27.488).
9. Find the probability
(a) P(2.19 < F10,12 < 7.29); (b) P(3.37 < F7 ,5 < 10.46)
10. For what value of f will P( F7 ,9 < f ) = 0.99?

Point Estimation

Lesson: Point Estimation
Lesson Developer: Kamlesh Aggarwal and Nidhi Aggarwal
College/Department: Department Of Economics, Spm
College and Mata Sundari College, University Of Delhi

Point Estimation
TABLE OF CONTENTS
1. General Concepts of Point Estimation 2
2. Desirable Properties of Point Estimators 4

2.1 Unbiased Estimators 4
2.2 Efficient Estimators 8
2.3 Consistent Estimators 10
3. Precision of the Estimate 13
4. Methods of Point Estimation 14

4.1 The Method of Moments 14
4.2 The Method of Maximum Likelihood 15

Point Estimation
Reference: Jay L .Devore : Probability and Statistics for

Engineering and the Sciences, 8th Edition.

Point Estimation
POINT ESTIMATION
Learning Objectives
The need to obtain estimates of relevant population parameters in business and

economics can be given by numerous examples, e.g. , a marketing organization may be
interested in estimates of average income in a metropolitan area; a production department
may desire an estimate of the percentage of defective articles produced by a new
production process; or a bank may want an estimate of average interest rates on mortgages
in a certain section of the country. In all of these cases, it is very costly or simply impossible
to study complete universe to get the required information. Further, in such cases, exact
accuracy is not required and estimates derived from sample data would probably provide
appropriate information to meet the demands of the practical situation. After completing
study of this chapter you will become familiar with such statistical estimation procedures
which provide us with the means of obtaining estimates of population parameters with
desired degree of precision. You will be able to choose the most appropriate value of a
parameter (or the values of several parameters) for a given situation from a possible set of
alternatives, as we will discuss various desirable properties of estimators and develop the
concept of sampling distribution of statistic and standard error.
Two different types of estimates of population parameters are of interest: 'point
estimates' and 'interval estimates'. Suppose we say that the average height of female
students in XYZ College is 5.28 feet, we are giving a point estimate. If, on the other hand,
we say that the height is 5.28 0.02 feet, that is, the height lies between 5.26 and 5.30
feet, we are giving an interval estimate. In this chapter we will concentrate on point
estimates.
1. General Concepts of Point Estimation

When we use the value of a statistic to estimate a population parameter, we call this
point estimation and we refer to the value of the statistic as a point estimate of the
parameter. Correspondingly, we refer to the statistics themselves as point estimators. For
example, sample mean, , may be used as a point estimator of population mean,µ , in which
case is a point estimate of this parameter. Similarly, sample variance, ,may be used as a
point estimator of population variance, , in which case is a point estimate of this
parameter. These estimates are called point estimates because in each case a single
number or a single point on the real axis, is used to estimate the parameter.

Point Estimation
Now we will explain that estimators themselves are random variables. Usually we
describe a sample of size n by the values , ... of the random variables , , ...
. If sampling is with replacement, , , ... would be independent, identically
distributed random variables having probability distribution ( ). Their joint distribution
would then be
P( = , = ,........, = )= .....
Now we can use the sample values , ... to compute some statistic (mean, variance
etc.) and use this as an estimate of population parameter. Algebraically, a statistic for a
sample of size n can be defined as a function of the random variables , , ... , i.e.,
g( , , ... ).The function g( , , ... ), that is any statistic, is another random
variable, whose values can be represented by g( , ... ). The same holds true if
we have more than one sample. Suppose we take two samples of heights of m male
students and n female students at a particular university. We represent sample values by
, ... and , , ... respectively. The difference between the two sample mean
heights is - , and is the sensible statistic for estimating - , the difference between
the two population mean heights. Now the statistic - is a linear combination of two
random variables , , ... and , , ... and so itself is a random variable.
Since estimators are random variables, one of the key problems of point estimation
is to study their sampling distributions to make a comparison among different estimators.
For instance, when we estimate the variance of a population on the basis of a random
sample, we can hardly expect that the value of we get will actually equal ,but it would
be reassuring, at least, to know whether we can expect it to be close. Similarly, suppose we
draw a random sample of size n from a normal population with mean value . Now sample
arithmetic mean is a natural statistic for estimating . However, median of the population,
average of the two extreme observations in the population and k% trimmed mean are also
equal to , since normal distributions are symmetric. So we can consider any of the
following estimators for :
(a) Estimator = =Arithmetic Mean
(b) Estimator = =Median
(c) Estimator = = the average of the two extreme observations
in the sample

Point Estimation
(d) Estimator = = the % trimmed mean(discard the smallest and largest %
of the sample and then average)

Each one of the estimators (a) - (d) are reasonable point estimators of . Since each uses a
different measure of the center of the sample to estimate so each estimator will give a
different estimate for .
Example 1 : Consider the accompanying 20 observations on weights of six year old

children.
24.46 25.61 26.25 26.42 26.66 27.15 27.31 27.54 27.74 27.94
27.98 28.04 28.28 28.49 28.50 28.87 29.11 29.13 29.50 30.88
We assume that the distribution of weights is normal with mean . So we can consider
, as the point estimators for . The estimates are 27.793, 27.960, 27.670
and 27.838 respectively. So each estimator is giving a different estimate for .
Which of these estimates is closest to the true value? We cannot answer this without
knowing the true value of (in which case estimation is unnecessary).Questions that can be
answered are, "which estimator, when used on other samples of 's will tend to produce
estimates closest to the true value, which will expose us to the smallest risk, which will give
us the most information at the lowest cost and so forth?"To decide which estimator is most
appropriate in a given situation, various statistical properties of estimators can be used.
2. Desirable Properties of Point Estimators

The particular properties of estimators that we shall discuss are unbiasedness,
efficiency and consistency.
2.1 Unbiased Estimators

In real world there are no perfect estimators that always give the right answer.
Thus, it would seem reasonable that an estimator should do so at least on the average,
i.e.its expected value should equal the parameter that it is supposed to estimate. If this is
the case, the estimator is said to be unbiased, otherwise, it is said to be biased. In other
words, if we repeatedly draw random samples from the same population and calculate same
statistic for each sample, then the value of statistic will be different for different samples
due to sampling fluctuations but the expected or mean value of this statistic should be equal
to true parameter value.

Point Estimation
Definition: Suppose we denote a point estimator by . Then is an unbiased estimator of

the true parameter value , if expected value of is equal to the true parameter value of
for every possible value of . If this does not hold true then is a biased estimator of . The
difference between the expected value of and is called the bias of . It should be noted
that expected value means only the arithmetic mean and not any other measure of central
value like median, mode etc. of the distribution of .
In figure 1 below we picture the distributions of biased and unbiased estimators.
Figure 1. The pdf's of a biased estimator and an unbiased estimator for a parameter .
In figure 1, the sampling distribution of is centered at the true parameter value

i.e.E( )= while the sampling distribution of is centered at i.e.E( )= . So is an
unbiased estimator of and is a biased estimator of and bias of =( - ).
One may feel that, it is necessary to know the true parameter value to see whether
an estimator is biased or unbiased. This is not usually the case because unbiasedness is a
general property of the estimator's sampling distribution-where it is centered-which is
typically not dependent on any particular parameter value.The following examples will
illustrate this:
Example 2: If X, the number of sample successes, is a binomial random variable with

parameters n and , then the sample proportion, = is an unbiased estimator of p
irrespective of the true value of .
Proof: E =E = E(X) =
Hence the distribution of the estimator will be centered at the true value p.

Point Estimation
Example 3: Let , ,----- be a random sample from a normal population with mean
and variance then the estimator = is an unbiased estimator of µ while = is
a biased estimator of .
Proof: Since , ,----- are random variables having the same distribution as the
population, which has mean µ, we have
E( )=µ for i=1,2,........n
Then since the sample mean is defined as =
We have as required
E( )= = (nµ)=µ
Hence is an unbiased estimator of µ irrespective of the true value of µ.

However
E( )=E
= E
Then, since E (given) and E = as shown below
it follows that E( )= =
which is very nearly only for large values of n(say, n 30). The desired unbiased estimator
is defined by
= = so that E( )=
Again is an unbiased estimator of irrespective of the true parameter value.
It can be noted that we have divided the sum of squared deviations by (n-1) instead of n.
The reason for this is that by definition we should have taken deviations from µ rather than
. But we do not know the value of µ so we have to take deviations from . Since s will
always be closer to than to µ so the sum of squared deviations is underestimating the true
sum of squared deviations.

Point Estimation
Proof:
Denote by L. Now L will be minimised when its first derivative with respect to c is
zero and second derivative with respect to c is positive. Differentiating with respect to c we
get
=2 (-1) = 0
= =
=2 >0
Hence is minimum for = . So if µ then > .
In order to make a correction for this underestimation we divide by (n-1) rather than
n.
Now we will discuss two basic difficulties associated with the concept of
unbiasedness. One difficulty associated with the concept of unbiasedness is that it may not
be retained under functional transformations, i.e. if is an unbiased estimator of , it does
not necessarily follow that g is an unbiased estimator of g For example, although is
an unbiased estimator of but is not an unbiased estimator of .Taking the square root
messes up the property of unbiasedness. Second difficulty associated with the concept of
unbiasedness is that unbiased estimators are not necessarily unique. The following example
will illustrate this:
Example 4: Suppose y is approximately proportional to ,that is, y for some value
.So for any fixed , Y is a random variable having mean value . That is, we assume that
the mean value of Y is related to by a line passing through (0,0) but that the observed
value of Y will typically deviate from this line. Now we can consider any of the following
three estimators of
(1) =
(2) =
(3) =
We can show that all three are unbiased.
(1) E = = = = =

Point Estimation
(2) E = = ( )= =
(3) E = = = = ( )=
Similarly, if , ,----- is a random sample from a normal distribution with mean µ, then
, and trimmed mean with any percentage are all unbiased estimators of µ.
So the principle of unbiasedness (preferring an unbiased estimator to a biased one)

cannot be invoked to select an estimator. What we now need is a criterion for choosing
among unbiased estimators.
is said to be an unbiased estimator of if E =
2.2 Efficient Estimators

Suppose there are more than one unbiased estimators of . Then the question arises which
one is the best. To answer this question we look at the spreads of the distributions about
of various unbiased estimators and select the one which has least spread. In figure 2 below
we have shown probability density functions (pdf’s)of two estimators and .
Figure 2: Graphs of the pdf's of two different unbiased estimators
It can be seen that both and are unbiased estimators of as pdf of each is centered at
, but has more spread as compared to . So we select . is also called minimum
variance unbiased estimator (MVUE) of as it has least variance among all unbiased
estimators of .
Example 5: For a normal population, the sampling distributions of the mean and median
both have the same mean, namely, the population mean. So both are unbiased estimators.

Point Estimation
However, the variance of the sampling distribution of mean is equal to which is smaller
than that of the variance of sampling distribution of median which is equal to =
Therefore, the mean provides a more efficient estimate than the median and the
efficiency of the median relative to the mean is approximately
= =
or about 64%. It means that mean requires only 64% as many observations as the median
to estimate with the same reliability.
However, if a choice is to be made among different estimators on the basis of

efficiency criterion, it is quite possible that sometimes a biased estimator is preferable to
MVUE as e.g. in figure 3 given below.
Figure 3: A biased estimator that is preferable to the MVUE.
So we choose , a biased estimator, as it has smaller spread as compared to

which is MVUE.
However, if is not an unbiased estimator of a given parameter , we judge its
merits and make efficiency comparisons on the basis of the expected or mean squared
error (MSE), E ,instead of the variance of . If is unbiased, then MSE( )= V( ),
but in general MSE( )= V( )+ So if is a biased estimator of and is an unbiased

estimator of then we should compare V( ) with MSE( )to make efficiency comparisons.
Question 1: Show that is a biased but more efficient estimator of population variance
, as compared to where = and =

Point Estimation
Solution: We have already proved that is a biased estimator of population variance

while is an unbiased estimator. Now to make efficiency comparisons we will have to
compare MSE( ) with MSE( ). Now since is an unbiased estimator so
MSE( ) = Var( ) while MSE( )=Var( )+ .
It can be shown that
Var( )=
Since by definition Var( )= Var , it means that
Var = and Var =
So Var( )=Var =
Now bias of is equivalent to E( )- = - =-
Hence MSE( )= + =
Comparing MSE of the two estimators we get
MSE( )- MSE( )= - = <0(since the numerator is always negative and
the denominator is always positive).

So MSE( ) > MSE( ). It means that although is a biased estimator of population
variance but it is an efficient estimator as compared to unbiased estimator .Hence,
whether we choose or as an estimator of will depend on whether unbiasedness or

efficiency is more important in a particular situation.
Among all estimators of , the efficient estimator is one

that has minimum mean squared error (MSE), E . If
V( ) V( ) where E( ) = E( )= then is efficient.
2.3 Consistent Estimators

Clearly, one would in practice prefer to have estimates that are both efficient and
unbiased, but this is not always possible. So the general practice is to consider all unbiased
and asymptotically unbiased estimators of and select the one that has minimum variance
among these. The reason is that sometimes we want to be assured that, at least for large n,
the estimators will take on values which are very close to the respective parameters. This
concept of closeness is generalized in the following definition of consistency.

Point Estimation
Definition: If is an unbiased or asymptotically unbiased estimator of the parameter and

variance 0 as n , then is a consistent estimator of . Informally the definition says
that when n is sufficiently large, we can be practically certain that the error made with a
consistent estimator will be less than any small pre-assigned positive constant.
Figure 4 : variance 0 as n
Note that consistency is an asymptotic property, namely, a limiting property of an

estimator. There may be more than one unbiased estimators which are consistent but there
can be only one minimum variance unbiased estimator.
Question 2: Show that is a consistent estimator of the binomial parameter .
Solution: Since is an unbiased estimator of , it remains only to be shown that Var( ) 0
as n
= = = which tends to zero as n as desired.
Question 3: Show that is a consistent estimator of the mean of a normal population

which has a finite variance.
Solution: Since we have already shown that is an unbiased estimator of , it remains
only to be shown that Var( ) 0 as n
By definition Var =E = (as already shown in example 3)
which tends to zero as n as desired.
The statistic is a consistent estimator of the parameter if

Point Estimation
and only if for each positive constant c, =0
Now having discussed the desirable properties of point estimators, it is also

important to know as to what are the main factors which decide whether an estimator
possesses these properties? The most important factor is the sampling distribution of the
estimator. However, the sampling distribution of the estimator depends on the distribution
of the population from which the sample is drawn. In particular,
1) If we draw a random sample from a normal population, then is the best among the
four estimators ( , , and ), since its variance is least among all unbiased
estimators.
2) If we draw a random sample from a Cauchy distribution,
Figure 5 : Cauchy Distribution
then and are bad estimators for , while is reasonably good. is bad as it is very
sensitive to extreme observations, and due to heavy tails of the Cauchy distribution it is
very likely that a few such observations appear in any sample.
3) If we draw a random sample from a uniform distribution, then is the best estimator.
is very sensitive to extreme observations but such observations are unlikely to
appear in any sample as uniform distribution does not have any tails.
4) The trimmed mean is not best in any of these three situations. However it is quite good
in all three. Hence with small trimming percentage is called a robust estimator i.e.
one that performs reasonably well for a wide variety of population distributions.

Point Estimation
So both i.e. distribution of population and sampling distribution of estimator are important
to decide which estimator is best for a given situation.
3. Precision of the Estimate

Whenever we are making an inference about a population parameter on the basis of sample
statistic, we are also interested in, as to, how much it is reliable. The best indicator is
standard error of the relevant estimator which we can denote by . It is the size of
an average deviation between . If we use estimated values of some unknown

parameters, then we call it estimated standard error and denote it by or by .
Example 6: Let , ,----- be a random sample from a normal population, then the
standard error of = is given by = . If, we do not know, the value of then we can
substitute the estimate =s into to obtain the estimated standard error = .

Question 4: Find out the standard error of sample proportion = where X is a binomial
random variable with parameters n and p.
Solution: = = = = .Since p and q are unknown so we
substitute = and = into yielding the estimated standard error = .
We can also use the standard error of the estimator used to convert point estimate
into interval estimate.
Suppose sample size is large, then distribution of point estimator will be

approximately normal and we can be reasonably confident that the true value of would lie
within approximately two standard errors of . Thus point estimate translates to the
interval estimate ± .
If is unbiased but distribution is not normal, then we can be reasonably confident

that the true value of would lie within approximately four standard error of . In
summary, the standard error tells us roughly within what distance of we can expect the
true value of to lie.
4. Methods of Point Estimation

Point Estimation
As we have seen in this chapter, there can be many different ways (estimators) of
estimating a parameter of a population. Further different estimators have various desirable
properties to varying degrees. Therefore, it would seem desirable to have some general
methods that yield estimators with reasonable desirable properties. Here we will discuss two
such methods, the method of moments, which is historically one of the oldest methods
and the method of maximum likelihood. Although maximum likelihood estimators are
generally preferable to moment estimators because of certain efficiency properties, they
often require significantly more computation than do moment estimators.
4.1 The Method of Moments

Let , ,----- be a random sample from a pmf or pdf ( ). For k=1,2,3,......, the
kth population moment, or kth moment of the distribution ( ), is E .The kth sample
moment is denoted by ; symbolically, =
Thus the first population moment is E(X)= , and the first sample moment is = .
The second population and sample moments are E and respectively.
The method of moments consists of equating the first few moments of a population
to the corresponding moments of a sample, thus getting as many equations as are needed
to solve for the unknown parameters of the population.
Thus the method of moments consists of solving the system of equations
= k=1,2-----p
for the p parameters of the population.
Example 7: If we want to estimate the parameter p of the binomial distribution when n is

known, then the system of equations we have to solve is =
Since =np so =np
Hence =
If both n and p are unknown, then the system of equations we shall have to solve is
= and =
Since = np and = npq+
we get
=np and npq +

Point Estimation
and solving these two equations for n and p, we find the estimates of the two parameters of
the binomial distribution.
Since npq +
= q+
= =(1- )
=1-
Similarly, since =np
= =
Question 5: Given a random sample of size n from a uniform population with =1, use the
method of moments to obtain a formula for estimating the parameter .
Solution: The equation that we shall have to solve is = where = and
= = . Thus, = and we can write the estimate of as =2 -1.
4.2 The Method of Maximum Likelihood

The method of maximum likelihood looks at the values of a random sample and
then chooses as our estimate of the unknown population parameter, the value for which the
probability of obtaining the observed data is a maximum. The principle on which the method
of maximum likelihood is based can be understood with the following example.
Example 8: Suppose Mr X receives five letters on some particular day, but unfortunately
one of them gets misplaced before he has a chance to open it. If among the remaining four
letters three contain credit-card billings and the other one does not, what might be a good
estimate of k, the total number of credit-card billings among the five letters received?

Point Estimation
Clearly k must be three or four. Assuming that each letter had the same chance of being
misplaced, we find that the probability of the observed data is
= for k=3
and
= for k=4
Therefore, if we choose as our estimate of k the value that maximizes the probability of
getting the observed data, we obtain k=4. We call this estimate a maximum likelihood
estimate and the method by which it was obtained is called the method of maximum
likelihood.
In the general case, if the observed sample values are , ,...... ,we can write in the
discrete case
P( = = ,......, = )= ( ,
; ) which is just the value of the joint
,......
probability distribution of the random variables , ,...... at the sample point ( ,
,...... ). Since the sample values have been observed and are therefore fixed numbers,
we regard ( , ,...... ; ) as the value of a function of the parameter ,referred to as
the likelihood function L( ). A similar definition applies when the random sample comes
from a continuous population, but in that case ( , ,...... ; ) is the value of the joint
probability density at the sample point ( , ,...... ). The method of maximum
likelihood consists of maximizing the likelihood function with respect to , and we refer to
the value of which maximizes the likelihood function as the maximum likelihood estimate
of .To maximize L( )= ( , ,...... ; ) we take the derivative of L( ) with respect
to and set it equal to zero.
The method is capable of generalization. In case there are several parameters, we

take the partial derivatives with respect to each parameter, set them equal to zero, and
solve the resulting equations simultaneously. Moreover if we draw a large sample from a
population which has well specified distribution function then maximum likelihood estimate
of any parameter will be approximately MVUE i.e. it will be approximately unbiased and
approximately have least variance.
Question 6: Given "successes" in n trials, find the maximum likelihood estimator of the
parameter of the binomial distribution.
Solution: To find the value of which maximizes

Point Estimation
L( )=b( ;n, )=( ) , it will be convenient to make use of the fact that the
value of which maximizes L( ) will also maximize
ln L( )=ln( )+ ln +(n- ) ln(1- )
Thus we get = -
and, equating this derivative to 0 and solving for , we find that the likelihood function has
a maximum at = . Hence the maximum likelihood estimator of the parameter of the
binomial distribution is = .
Question 7: Suppose that n observations, , ,...... are made from a normally
distributed population. Find
(a) the maximum likelihood estimate of the mean if variance is known but mean is unknown
(b) the maximum likelihood estimate of the variance if mean is known but variance is
unknown.
Solution:
(a) Since f( , )=
we have
(1) L = f( , )........ f( , )=
Therefore,
(2) ln L = - ln -
Taking the partial derivative with respect to yields
(3) =
Setting = 0 gives
(4) = 0 i.e. =0
or
(5) =
Therefore the maximum likelihood estimate is the sample mean.
(b) Since f( , )=

Point Estimation
we have
(1) L = f( , )........ f( , )=
Therefore,
(2) ln L = - ln -
Taking the partial derivative with respect to yields
(3) =- +
Setting = 0 gives
Question 8: Prove that the maximum likelihood estimate of the parameter of a
population having density function: ( ,0 , for a sample of unit size is ,
being the sample value. Show also that the estimate is biased.
Solution: Sample of unit size =1
likelihood function L( ) = ( = f(
logL( ) = log2 - log + log(

= log2 - 2log + log(
Differentiating w.r.t. we get
=- +
= -
For maxima or minima =0
- + =0 = = =
When = ,
Maximum likelihood estimator of is given by =
E( ) = E( )=2 = =
Since E( ) , = is not an unbiased estimate of .

Point Estimation
Practice Questions:
Q.1 Assuming that the population is normal, give examples of estimators (or estimates)
which are
(a) unbiased and efficient
(b) unbiased and inefficient
(c) biased and inefficient.
Q.2 Show that is a minimum variance unbiased estimator of the mean of a normal
population.
Q.3 If is an estimator of a parameter , its bias is given by b=E( )- . Show that
E =V( )+ .
Q.4 If and are unbiased estimators of the same parameter , what condition must be
imposed on the constants and so that + is also an unbiased estimator of ?
Q.5 Suppose that we use the largest value of a random sample of size n to estimate the
parameter of the population.
( )=
=0 Otherwise
Check whether this estimator is (a) unbiased and (b) consistent.
Q.6 Show that for a random sample from a normal population, the sample variance is a
consistent estimator of where = .
Q.7 In estimating the mean of a normal population on the basis of a random sample of
size 2n+1, what is the efficiency of the median relative to the mean?
Q.8 If , ,...... are the values of a random sample of size n from a population having
the density
( ; )=
=0 otherwise
find an estimator for by the method of moments.
Q.9 Let ,... be a random sample from a gamma distribution with parameters and .
a. Derive the equations whose solutions yield the maximum likelihood estimators of
and . Do you think they can be solved explicitly ?
b. Show that the mle of = is = .
Q.10 Among N independent random variables having identical binomial distribution with the
parameters and n=2, take on the value zero, take on the value one, and take on
the value two. Find an estimate of using

Point Estimation
(a) the method of moments

(b) the method of maximum likelihood.

Point Estimates For Population Mean, Variance And Proportions: Single Sample And
Two Samples

Lesson: Point Estimates For Population Mean, Variance
And Proportions: Single Sample And Two Samples
Lesson Developer: Kamlesh Aggarwal and Nidhi Aggarwal
College/Department: Department Of Economics, Spm
College and Mata Sundari College, University Of Delhi
1
Two Samples
TABLE OF CONTENTS
1. Basic Concepts 2
2. Point Estimates for Population Mean 3
2.1 Methodology 3
2.2 Solved Examples 6
3. Point Estimates for Population Variance 9
3.1 Methodology 10
4. Point Estimates for Population Proportions 14
4.1 Methodology 14
Reference: Jay L .Devore : Probability and Statistics for Engineering

and the Sciences, 8th Edition.
2
Two Samples
POINT ESTIMATES FOR POPULATION MEAN, VARIANCE

AND PROPORTIONS: SINGLE SAMPLE AND TWO SAMPLES
Learning Objectives
After completing study of this chapter we will be able to make a reasonably precise
inference about the population parameters like mean, variance and proportion on the basis
of sample data. We will also be able to make an inference about the difference between the
means, variances and proportions of two different population distributions on the basis of
samples collected from each of these populations. We will also be able to have an idea
about the accuracy of the above estimates.
1.Basic Concepts
Point estimate is a single number determined from a sample and is used to estimate
the population value. By implication, the term estimate refers to the actual sample result
which is used to represent the parameter being estimated. If the average age based on a
random sample of size n = 36 is 65 years, the sample mean = 65 years is an estimate of
the parameter and the statistic its estimator.
Clearly, a point estimate is normally different from the actual value of the parameter
for the simple reason that a point estimate is derived from a random sample and the value
of the point estimate varies from sample to sample. So while reporting the value of a point
estimate, we should also give some indication of its precision or error. The best indicator is
standard error of the estimator used. The standard error of an estimator is its standard
deviation which can be denoted by . It is the size of an average deviation between and
. If we use estimated values of some unknown parameters, then we call it estimated
standard error and denote it by or by .
Now we will show the computation of point estimates and their standard error for
population mean, variance and proportion for a single sample. We will also extend these
computation methods to situations involving the means, proportions and variances of two
different population distributions.
3
Two Samples
2. Point Estimates for Population Mean

We often need some idea about the average value of the relevant population. For
example, we might be interested in knowing the average daily sales of soft drinks in Delhi.
Similarly, sometimes we want to make an inference about the difference in average values
of two different populations. Not only we want to estimate these parameter values but we
also like to have an idea about the precision of our estimates. Now we will discuss methods
for computing these estimates and their precision.
2.1 Methodology
Sample arithmetic mean, ; sample median, ; sample k% trimmed mean,
and average of the two extreme observations in the sample, , can all be used as
estimators of population mean . However, when there is more than one estimator, the best
estimator is the one which gives an estimate closer to the actual value of which will
depend on the sampling distribution of the estimator. However, the sampling distribution of
the estimator itself depends on the distribution of the population from which the sample is
drawn. In particular,
1) If we draw a random sample from a normal population, then is the best among the
four estimators ( , , and ), since its variance is least among all unbiased
estimators. An estimator is called an unbiased estimator of population parameter if
E( )= .
2) If we draw a random sample from a Cauchy distribution,
Figure 1 : Cauchy Distribution
4
Two Samples
then and are bad estimators for , while is reasonably good. is bad as it is very
sensitive to extreme observations, and due to heavy tails of the Cauchy distribution it is
very likely that a few such observations appear in any sample.
3) If we draw a random sample from a uniform distribution, then is the best estimator.
is very sensitive to extreme observations but such observations are unlikely to
appear in any sample as uniform distribution does not have any tails.
4) The trimmed mean is not best in any of these three situations. However it is quite good
in all three. Hence, with small trimming percentage is called a robust estimator
i.e. one that performs reasonably well for a wide variety of population distributions.
So both i.e. distribution of population and sampling distribution of estimator are

important to decide which estimator is best for a given situation. Now we will show some
important results assuming that the population is normal.
Let , ,----- be a random sample from a normal population with mean and
variance then = is the best estimator of . It can be shown that the expected value
of is , so is an unbiased estimator of .
Proof: Since the sample mean is defined as
Hence
E( )= = (nµ) [since = for i=1,2…….. n]
=µ as desired.
Further it can be shown that if the value of is known, the standard error of is =
Proof:
= +…………….+ [since = for i=1,2…….. n]
5
Two Samples
Hence = .
If we do not know the value of , then we substitute the estimate = s into and denote
the estimated standard error by = = .
Now we extend the above methods to problems which deal with the means of two different
population distributions. For instance, if denotes true average Rockwell hardness for heat
-treated steel specimens and denotes true average hardness for cold-rolled specimens,
then an investigator might wish to use samples of hardness observations from each type of
steel as a basis for calculating an estimate of - , the difference between the two true
average hardnesses. Assuming that
(1) , ,........ is a random sample from a distribution with mean and variance
, and
(2) , ,........ is a random sample from a distribution with mean and variance ,
and
(3) The X and Y samples are independent of one another.
It can be shown that - , the difference between the two sample means can be used as
natural estimator of - , the difference between the corresponding means of two different
population distributions. The expected value of - is equal to - , so - is an
unbiased estimator of - .
Proof: E( - )= E
= ( )- ( ) [since = for i=1,2…. m and = for i=1,2…. n]
= - as desired.
Further it can be shown that the standard deviation of - is =
Proof: Since X and Y samples are independent, so and will be independent quantities
implying that Cov( ) = 0. Hence the variance of the difference between the two sample
means is the sum of V( ) and V( ):
6
Two Samples
V( - )= V( )+ V( )=
The standard deviation of - is the square root of this expression. Hence
The sample variances must be used when and are unknown.
2.2 Solved Examples
Example 2.1:
We examine each one of the 150 newly typed pages and record the number of mistakes per
page (the pages are supposed to be free of mistakes). We observe the following data:
Number of mistakes
per page 0 1 2 3 4 5 6 7
Observed frequency 18 37 42 30 13 7 2 1
Let X=the number of mistakes on a randomly chosen page. Also assume that X follows a
Poisson distribution with parameter .
a) Find an unbiased estimator of and compute the estimate for the data.
b) What is the standard error of your estimator? Compute the estimated standard error.
Solution:
a. An unbiased estimator of is given by sample mean, , since
E( )=E =
= ( ) [since = for i=1,2…….. n]
= as desired.
Estimate= = = 2.11
7
Two Samples
b. Let the standard deviation of our estimator, , be denoted by
Now = = (since = for X Poisson)
Substituting the estimated value of i.e. to compute the estimated standard error, we get
= =0.1186.
Example 2.2:
If , ..... constitute a random sample from a population with the mean , what condition
must be imposed on the constants , ..... , so that .....+
is an unbiased estimator of ?
Solution:
.....+ is an unbiased estimator of if E( .....+ )=
Now E[ .....+ ]
= E( )+ E( )+.......+ E( )
= +.....+ [since = for i=1,2…….. n]
=( +.....+ )
= only if ( +.....+ )=1
So +.....+ should be equal to one for .....+ to be an unbiased
estimator of .
Example 2.3:
Independent random samples of size and are taken from a normal population with the
mean and the variance . If =25, =50, = 27.6 and =38.1, find an unbiased
estimator of .
Solution:
An unbiased estimator of is given by = = = 34.6
It is unbiased since E( )= as shown below:
E( )= E( )+ E( )
= ( )+ ( ) (since E( )= for i = 1,2)
= as desired.
Example 2.4:
A sample of 20 measurements each on flexural strength (MPa) for concrete beams of a
certain type and cylinders respectively gave the following results.
8
Two Samples
Beams: 5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5
7.9 9.0 8.2 8.7 7.8 9.7 7.4 7.7 11.6 11.3
Cylinders: 6.1 5.8 7.8 7.1 7.2 9.2 6.6 8.3 7.0 8.3
7.8 8.1 7.4 8.5 8.9 9.8 9.7 14.1 12.6 11.2
Before obtaining data we denote the beam strengths by , and the cylinder
strengths by , . Suppose that the 's are drawn from a population with mean
and standard deviation . Similarly 's are drawn from another population with mean and
standard deviation . Also assume that 's are independent of the 's.
a) Prove that an unbiased estimator of - is given by - .Compute the estimate for
the above data.

b) What is the variance and standard error of your estimator in part (a)? Compute the
estimated standard error.
Solution:
a. - is an unbiased estimator of - if E( - )= - .
Now E( - )=E
= ( )- ( ) [since = and = for i=1,2…….. n]
= - as desired.
To find an estimate for the given data, we first compute and .
Table: Calculations for mean, variance
X Y X2 Y2
5.9 6.1 34.81 37.21

7.2 5.8 51.84 33.64
7.3 7.8 53.29 60.84
6.3 7.1 39.69 50.41
8.1 7.2 65.61 51.84
6.8 9.2 46.24 84.64
7.0 6.6 49.00 43.56
7.6 8.3 57.76 68.89
6.8 7.0 46.24 49.00
6.5 8.3 42.25 68.89
7.9 7.8 62.41 60.84
9.0 8.1 81.00 65.61
9
Two Samples
8.2 7.4 67.24 54.76
8.7 8.5 75.69 72.25
7.8 8.9 60.84 79.21
9.7 9.8 94.09 96.04
7.4 9.7 54.76 94.09
7.7 14.1 59.29 198.81
11.6 12.6 134.56 158.76
11.3 11.2 127.69 125.44
158.8 =171.5 =1304.3 =1554.73
= = = 7.94
= = = 8.575
- = - = 7.94 - 8.575 = -0.635
b. Var( - )= Var ( )+ Var ( )
Now Var ( )= and Var ( )=
So = + and hence =
To compute the estimated standard error we will first have to compute standard deviation,
S, for both X and Y variables. Now =
Substituting the values, we get
= =1.512
Similarly =
= = 2.104.
Hence = = 0.579.
3. Point Estimates for Population Variance

Inferences regarding a population variance or standard deviation are mostly needed to
find an estimate of the precision of various point estimates. Similarly sometimes we face
10
Two Samples
problems where comparison of two population variances (or standard deviations) is
required. Now we will discuss methods for computing these estimates.
3.1 Methodology
If the population is normal then we can use the following result concerning the
sample variance to draw inferences about a population variance.
Assuming that the population is normally distributed = is an unbiased
estimator of .
Proof: E( )=E
= E
Then, since E (given) and E = as shown below
it follows that E( )= = = as desired.
Now suppose we want to compare the variances of two different populations.
Assuming that the populations under investigation are normal, can be used as a point
estimator of .
3.2 Solved Examples

Example 3.1:
Given a random sample of size n from a population which has the known mean and the
finite variance , show that is an unbiased estimator of .
Solution:
We know that is an unbiased estimator of if E
Now E { }=
= as desired.
Example 3.2:
11
Two Samples
Consider a hypothetical normal population comprising only three values 2,5 and 8. Draw all
possible samples of size 2 and calculate the mean and variance
= for each sample. Examine whether the statistics are unbiased

for the corresponding parameters. Show that = is an unbiased
estimator of population variance. Also calculate the variance of sampling distribution of
mean and verify that variance of is equal to .
Solution :
Table : Calculations for mean and variance of samples
Sr.No Sample Sample Sample Sample Sample

values totals mean variance Variance
( ) ( ) ( )
(1) (2) (3) (4) (5) (6) (7)
1 2,2 4 2 0 0 9
2 2,5 7 3.5 2.25 4.5 2.25
3 2,8 10 5 9 18 0
4 5,2 7 3.5 2.25 4.5 2.25
5 5,5 10 5 0 0 0
6 5,8 13 6.5 2.25 4.5 2.25
7 8,2 10 5 9 18 0
8 8,5 13 6.5 2.25 4.5 2.25
9 8,8 16 8 0 0 9
=45 =27 =54 =27
So there are 9 possible samples of size 2 as shown in column (2). The mean =
and variance = for each sample are shown in columns (4) and
(5) respectively.
To examine whether the statistics are unbiased for the corresponding parameters,
we will first have to calculate the mean and variance for the population.
Population mean = = =5
Population variance = = =6
Now the statistics are unbiased if
E( )= and E( )=
Substituting the values E( )= = =5
Since E( )= =5 so is an unbiased estimator of population mean.
12
Two Samples
Now E( )= = =3
Since E( )= 3 = 6, so is not an unbiased estimator of population variance. However

it can be shown that = is an unbiased estimator of population
variance . We find that E( )= = =6 = .
Variance of sampling distribution of mean is denoted by
= = =3= = =3, hence verified that =
Question 3.3:
Suppose each side of a square plot has length . So area of the plot will be . Since value
of is unknown so we take independent measurements , ..... of the length.
Assume that each has mean and variance .
a. Show that is a biased estimator for .
b. What value of will make the estimator - unbiased for , where
Solution:
a. Since is a random variable, so E( )=V( )+ = + ≠
So is not an unbiased estimator for .

b. For - to be an unbiased estimator for , E( - ) should be equal to
Now E( - )=E( )- E( )
=V( )+ - E( )
= + -
= only if =
Example 3.4:
A sample of 10 television tubes produced by a company showed that the mean lifetime is
1200 hours and the standard deviation is 100 hours.
a. Calculate the mean of the population of all television tubes produced by this company.
b. Compute the standard deviation of the population of all television tubes produced by this
company.
c. If the same results are obtained for 30, 50 and 100 television tubes, estimate the mean
and the standard deviation of the population.
13
Two Samples
d. What can you conclude about the relation between sample standard deviation and
estimates of population standard deviation for different sample sizes?
Solution:
a. We can use sample mean as an estimator of population mean . So =1200 hours.
b. We can use sample standard deviation defined as = as an estimator of
population standard deviation . So = 100 =105.4 hours.
c. The estimate for the population mean will remain same i.e. 1200 hours in all cases.
However, the estimate for population standard deviation will differ for different sample
sizes. If sample size is 30 then = 100 =101.7 hours. If sample size is 50 then
= 100 =101 hours. If sample size is 100 then = 100 =100.5 hours.
d. As sample size increases, estimates of population standard deviation come closer and
closer to sample standard deviation.
Example 3.5:
Suppose use of a certain type of pesticide increases average yield per acre by with
variance , whereas the use of second type of pesticide increases average yield per acre by
with the same variance .Let and denote the unbiased estimators of population
variances of yields based on sample sizes and respectively, of the two pesticides. Show
that the pooled estimator = ( )+ ( ) is an unbiased estimator of .
Solution:
The pooled estimator is an unbiased estimator of if E( )=
Now E( )= E( )+ E( )
= + [since E( )= for i=1,2]
= as desired.
Example 3.6:
14
Two Samples
Using data and calculations of example 2.4 compute a point estimate of the ratio of the
two standard deviations.
Solution:
A point estimate of the ratio of the two standard deviations, , is given by
= = = 0.719.
4. Point Estimates for Population Proportions
Inferences concerning population proportion for specified characteristics are often required
by the policymakers. Similarly sometimes estimates regarding differences in proportions of
two different populations are needed for policy decisions. Now we will discuss methods for
estimating these population parameters and also give an expression for estimating the
reliability of the estimates.
4.1 Methodology
Suppose a random sample of size n is taken from a population and it is found that
the number of “successes” is X. Now we can use = , the sample fraction of “successes” as
an estimator of p. E( ) = (unbiasedness) and =
Proof: E( )=E( )= E(X)= ( )=
= = = = as desired.
Now we extend these methods to situations involving the proportions of two different
population distributions. Let denote the true proportion of nickel-cadmium cells produced
under current operating conditions that are defective because of internal shorts, and let
represent the true proportion of cells with internal shorts produced under modified operating
conditions. If the rationale for the modified conditions is to reduce the proportion of
defective cells, a quality engineer would want to use sample information as a basis for
calculating an estimate of .
Suppose that a sample of size is selected from the first population and
independently a sample of size is selected from the second one. Let X denote the number
of successes in the first sample and Y be the number of successes in the second.
Independence of the two samples implies that X and Y are independent. Provided that the
two sample sizes are much smaller than the corresponding population sizes, X and Y can be
regarded as having binomial distributions. The natural estimator for , the difference in
population proportions, is the corresponding difference in sample proportions X/ – Y/ .
15
Two Samples
E( - )=
so - is an unbiased estimator of , and
V( - )= (where = )
Proof: Since E(X) = and E(Y) = ,
So E( - )= E = E(X) - E(Y) = - = as desired.
Similarly, since V(X) = and V(Y) = , and X and Y are independent,
So V( - )= V =V +V = V(X) + V(Y) = as desired.
4.2 Solved Examples
Example 4.1:
A sample of 20 students of XYZ College gave the following information on the brand of
calculator used (F = Fiamo, O = Orpat, C = Citizen, S= Sharp):
F F O F C F F S C O
S S F O C F F F O F
a. Estimate the true proportion of all such students who used a Fiamo calculator.
b. Of the 10 students who used a Fiamo calculator, 4 had graphing calculators. Estimate the
proportion of students who do not use a Fiamo graphing calculator.
Solution:
a. An estimate of the true proportion of all such students who used a Fiamo calculator is
given by = = = 0.5 where x is the number of favourable cases i.e. number of
students who used Fiamo calculator and n is the total number of cases i.e. total number
of students.
b. Using the same method as above, an estimate of the proportion of students who do not
use a Fiamo graphing calculator is given by = = = 0.8 where x is the number of
students who do not use a Fiamo graphing calculator.
Example 4.2:
16
Two Samples
A sample of 80 components is taken from a large factory and it is found that 68 components
are not defective.
a. Estimate the proportion of all such components that are not defective.
b. Suppose now we randomly select two of these components and connect them in series,
as shown here to construct a system.
The system will function if both components are not defective. Give a point estimate of the
proportion of properly working systems?
Solution:
Let p denote the probability that a component works properly and P denote the probability
that the system works properly. Then
a. Estimate of the proportion of all such components that are not defective is given by
= = =0.85.
where x is the number of favourable cases i.e. number of components that are not
defective and n is the total number of cases i.e. total number of components sampled.
b. A point estimate of the proportion of systems that work properly is given by
( )= = = 0.721.
Example 4.3:
A sample of 10 measurements of the weights of female students at xyz university gave the
following results.
Student 1 2 3 4 5 6 7 8 9 10
Weight(kg) 40 45 47.6 48.2 52.8 57 52.5 52 59 49
Assume that the population weight is normally distributed. Find
a. Two unbiased estimators of population mean and make efficiency comparisons.
b. An unbiased estimator of population variance.
c. A point estimate of the proportion of all such female students whose weight exceeds
53kg.
17
Two Samples
Solution:
a. Since population is normally distributed so both sample mean and median are unbiased
estimators of population mean
= = =50.31.
To calculate median our first step is to arrange weights of students either in increasing or
decreasing order as follows.
40 45 47.6 48.2 49 52 52.5 52.8 57 59
Since the total number of students is 10, so median weight would be the weight of th
student i.e. the average of the weights of 5th and 6th student which is = 50.5.
To make efficiency comparisons we should compare MSEs of mean and median. Since both
are unbiased so MSE of and is equal to V( ) and V( ) respectively. We know that for a
normal distribution V( )= and V( )= 1.57 .
Since V( ) V( ),so mean is more efficient estimator of population mean.
b. An unbiased estimator of population variance is given by
= = 31.35
c. Since the number of students in the sample whose weight exceeds 53kg is two, so a
point estimate for the population proportion of all such students whose weight exceeds 53kg
is given by = = where is the number of favourable cases i.e. number of students
whose weight exceeds 53kg and is the total number of cases i.e. total number of students
sampled.
Example 4.4:
Consider a random sample of 16 observations on plywood thickness. The observations are

following:
.88 .88 .83 1.09 1.04 1.12 1.29 1.31
1.49 1.48 1.59 1.65 1.62 1.76 1.71 1.83
Assume that plywood thickness follows a normal distribution.
18
Two Samples
a. Calculate an estimate of the average value of plywood thickness. Which estimator did
you use?
b. Calculate a point estimate of the median of the plywood thickness distribution, and state
which estimator you used.
c. Calculate a point estimate of the value that separates the largest 10% of all values in
the thickness distribution from the remaining 90%, and state which estimator you used.
d. Estimate P(X<1.5) i.e. the proportion of all thickness values less than 1.5.
e. What is the estimated standard error of the estimator that you used in part (b)?
Solution:
a. A point estimate of the mean value of plywood thickness is
= = =1.348
We used sample mean as an estimator because for a normal distribution is MVUE of

population mean
b. A point estimate of the median of the plywood thickness distribution is =1.348 because
for a normal distribution mean, median and mode are all equal. The reason for using sample
mean, , rather than sample median, , is that Var( ) is less than Var( ). So as an
estimator of population median is more reliable.
c. To calculate a point estimate of the value that separates the largest 10% of all values in
the thickness distribution from the remaining 90%, we can make use of the fact that for a
normal distribution + 1.28 is the value that separates the largest 10%of all values from
the remaining 90%. So we can use + 1.28 as an estimator where is sample mean
and is sample standard deviation defined as . Now our next step is to compute
= = = = 0.114965.
so = = 0.339 and + 1.28 = 1.348+1.28(0.339)=1.7819.

So the X value 1.7819 separates the largest 10% from the remaining 90%.
d. To estimate P(X<1.5), we can first standardize X variable and find Z value defined
as . Then use this Z value to find the area (probability) under the standard normal
curve. Now Z= = = 0.448. .
The area under the standard normal curve corresponding to Z=0.448 is 0.6736. So
P(X<1.5)= 0.6736.
19
Two Samples
e. The estimated standard error of the estimator, , that we used in part (b) is given by
= =0.08475.
Example 4.5:
A random sample of male students of size is taken from XYZ University and it is found
that scored more than 70% in their final exams. Another sample of female students of
size from the same University showed that scored more than 70%. Let denotes the
probability that a male student scores more than 70% and denotes the probability that a
female student scores more than 70% in their final exams.
a. Show that - is an unbiased estimator for .
b. Find an expression for the standard error of your estimator in part (a).
c. What is the use of and in estimating the standard error of your estimator?
d. If = = 200, =127,and =176, compute a point estimate for and also give
an estimate of its standard error.
Solution:
a. To show that - is an unbiased estimator for , we will have to prove that
E =
Now E = E( )- E( )
= -
= as desired.
b. The standard error of the estimator in part (a) is given by
= (Covariance is zero as are independent random variables)
= (Since Var )= for i=1,2)
20
Two Samples
c. We will use the observed values and to estimate the standard error of our estimator
by using for and for .
d. Substituting the values for , , and , we get an estimate of - as
- = - =- = -0.245
Using the relevant values, we get an estimate of the standard error of the estimator as
follows =
= .041.
PRACTICE QUESTIONS
Q.1 Consider a random sample ......... from the pdf

ƒ( ; ) = .5(1+ ) -1 1
where -1 1. Show that =3 is an unbiased estimator of .
Q.2 If , ....... constitute a random sample from a normal populations with = 0,
show that is an unbiased estimator of .
Q.3 A random sample of size 65 was taken to estimate the mean annual income of 1000
families and the mean and S. D. were found to be Rs. 6300 and Rs. 9.5 respectively. Find
an estimate for the population mean. Also calculate its standard error.
Q.4 A sample of 150 bulbs of brand A showed an average life of 1800 hrs with a standard
deviation of 15 hrs. Another sample of 100 bulbs of brand B showed an average life of 1500
hrs with a standard deviation of 11 hrs. Find an estimate for the difference of the mean life
of the population of A and B brand bulbs. Also calculate the standard error of the estimate.
21
Two Samples
Q.5 According to the mendelian law of segregation in genetics, when certain type of peas
are crossed, the probability that the plant yields a yellow pea is 3/4 and that it yields a
green pea is 1/4. For a plant yielding 400 peas, find the standard error of the proportion of
yellow peas.
Q.6 An insecticide of brand A was sprayed to kill mosquitos of a container having 150
mosquitoes. It was found that 100 of the mosquitoes were killed. When another container
having 170 mosquitoes of the same type was sprayed with brand B, 130 mosquitoes were
killed. Find an estimate for the difference in the effectiveness of the two brands of
insecticides. Also calculate the standard error of the estimate.
Q.7 The marketing manager of a large company conducted a sample survey in two states,
Bihar and Orissa, taking 400 sample salesman in each case. The main findings of research
are given in the following table;
State Average Sales Per Day Standard Deviation

Bihar Rs. 2500 Rs. 400
Orissa Rs. 2200 Rs. 500
Find an estimate for the difference in average per day sales of the salesmen in two states.
Also calculate the standard error of the estimate.
Q.8 The following results were obtained from two samples each drawn from two different
populations A and B;
Population A B
Sample I II
Sample size = 16 =9
Sample S. D. =3 =2
Find an estimate for the ratio of the population variances for brand A and B i.e. .
Q.9 A population consists of numbers 4, 5, 8, 10, 13. Enumerate all possible samples of
size 3 which can be drawn from the population without replacement and show that the
mean of the sampling distribution of the sample means is equal to the population mean.
22
Two Samples
Calculate the variance of the sampling distribution of the sample mean and show that it is
less than the population variance.
Q.10 A builder is considering two different areas of a large western state as sites for
primary school. Of 50 households surveyed in one area, the proportion of households having
primary school going children was 0.52. Similarly, of 45 households surveyed in another
area, the proportion of households having primary school going children was 0.48. Find an
estimate for the difference in the proportions of primary school going children in the two
areas of the state? Also calculate the standard error of the estimate?
23
P-value Tests for the Population Means
Lesson: P-Value Tests For The Population Means
Lesson Developer: Nupur Kataria
College/Department: Department Of Economics, KAMALA

Nehru College, University Of Delhi

TABLE OF CONTENTS
Section Number And Heading Page Number
1. Introduction 3
2. Tests of population means using p-values 3
2.1. p-values for large sample tests 4
2.2. p-values for small sample tests 8
2.3. Pooled t-test 12
3. Calculation of β 14
4. Selection of a test 16
5. Statistical vs practical significance of a test 16
6. The likelihood ratio principle 17
Content Developer
Nupur Kataria, Assistant Professor, Department of Economics, Kamala Nehru College,

University Of Delhi
References

1. Jay L. Devore: Probability and Statistics for Engineering and the Sciences, Cengage
Learning, 8th edition [Chapter 8 and 9].
2. Irwin Miller and Marylees Miller: Mathematical Statistics, Pearson, 7th edition.
3. Allen Webster: Probability and Statistics, 4th edition, Richard D. Irwin/McGraw-Hill, Burr
Ridge, IL, 2010
P-VALUE TESTS FOR THE POPULATION MEANS
Learning Objectives
This chapter aims at showing how p-values are calculated during hypothesis testing
procedures. The chapter focuses mainly on the tests of population means in single and two
samples. It will demonstrate how these tests can be carried out using p-values in both small
and large samples. Apart from this, you will learn how to find value of type II error in a
given scenario, when to use a particular test and also you will learn about the likelihood
ratio principle. The chapter concludes with practice questions which will help you to test
your concepts learned from this chapter.
1. INTRODUCTION
Hypothesis Testing is a statement about values of population parameters either in a single

sample or in two samples. These hypothesis testing procedures can be carried out using
critical values in case of small and large samples. However, there is another way in which
these procedures can be carried out called the p-value method. This chapter is mainly
concerned with how the tests of population means can be done using p-values in a single
and two samples. Apart from this, this chapter will also discuss calculation of β, selection of
a test, statistical versus practical significance and the likelihood ratio principle.
2. TESTS OF POPULATION MEANS USING P-VALUES
As already discussed, the p-value is the observed or the actual level of significance at
which we reject the null hypothesis. Here, instead of first finding the critical value of a test
statistic and comparing it with the calculated value to conclude whether null hypothesis is
rejected or not, we first calculate the value of the test statistic and then find the smallest
level of significance (which is basically the p-value) at which the null hypothesis is rejected.
Main advantage of using p-value is that there is no need to find critical values of the test

statistic for different levels of significance α every time. We just need to compare p-value
with α to check at what levels of significance null hypothesis is rejected and at what levels it
is not. Generally, the smaller is the p-value, the greater is the evidence against the null
hypothesis and hence greater is the probability of accepting the alternative hypothesis. In
particular, we reject the null hypothesis if p-value less than or equal to α and do not reject
null hypothesis if p-value is greater than α. Now, we move to the tests of population means
in case of large and small samples using the p-values.
2.1. P-VALUES FOR LARGE SAMPLE TESTS
Since the sample size is large, the test statistic for the population mean µ will be Z which is
the standard normal variable. Depending upon the alternative hypothesis, the p-value is
calculated for the observed or the calculated value of test statistic Z and then this p-value is
compared with the given level of significance α.
More specifically, we have the following three different tests:
a) Lower-tailed or left-tailed test
Null Hypothesis H0 : µ = µ0 (null value of µ)
Alternative Hypothesis H a : µ < µ0
p-value = area to the left of calculated z (negative value)= P(Z ≤ -z) = Φ(-z)
where as shown in figure 1 below.
Figure 1
b) Upper-tailed or right-tailed test

Alternative Hypothesis H a : µ > µ0
p-value = area to the right of calculated z (positive value)= P(Z ≥ z)= 1 - Φ(z) as shown in
following figure 2.
Figure 2
c) Two-tailed test
Alternative Hypothesis H a : µ ≠ µ0
p-value = sum of the area to the left of calculated negative z value and to the right of
calculated positive z value = P(Z ≤ -z) + P(Z ≥ z) = [Φ(-z)]+ [1 - Φ(z)] = 2[1 - Φ(z)] as
shown in Figure 3 below.
{since, area to left of -z is same as the area to the right of z due to symmetry of z-curve}

Figure 3
Once the p-value is calculated, it is compared with α using the decision rule as given by the
following figure 4:
(a) (b)
0 α 1
Figure 4
If p-value lies in (a) then we reject H0 but if p-value lies in (b) then we do not reject H0. In
other words,
Reject H0 if p-value ≤ α and
do not reject H0 if p-value > α
Now, the following examples illustrate the use of p-value to test population means in case of
large samples.
Example 3.1 The mean lifetime of a certain batteries in a sample of 100 manufactured by a
firm is found to be 1680 hours with a standard deviation of 150 hours. Do a two-tailed test
to check whether the true average lifetime of batteries is 1700 hours for level of significance
(a) 0.05 and (b) 0.01.
Step 1: µ = true average lifetime of batteries
Step 2: H0 : µ = 1700

Step 3: Ha : µ ≠ 1700
Step 4: Test statistic: = (1680-1700)/(150/ ) = -20/15 = -1.33
Step 5: p-value = 2[1 - Φ(|-1.33|)] = 2[1-Φ(1.33)] = 2[1-0.9082] = 0.1836
(since the test is two-tailed)
Step 6: Since p-value is greater than 0.05 and also 0.01, therefore we do not reject the null
hypothesis that the true average lifetime of batteries is 1700 hours at both 0.05 and 0.01
level of significance.
Example 3.2 Consider the previous example, test the hypothesis µ = 1700 against the
alternative hypothesis µ < 1700 for level of significance (a) 0.10 and (b) 0.05.
Step 1: µ = true average lifetime of batteries
Step 2: H0 : µ = 1700
Step 3: Ha : µ < 1700
Step 4: Test statistic: = (1680-1700)/(150/ ) = -20/15 = -1.33
Step 5: p-value = Φ (-1.33) = 0.0918
(since the test is left-tailed)
Step 6: Since p-value is less than 0.10 and but greater than 0.05, therefore we reject the
null hypothesis that the true average lifetime of batteries is 1700 hours at α = 0.10 but do
not reject the null hypothesis at α = 0.05.
Until now tests of single population mean were done, so we move to tests of two population
means in large samples. The hypothesis testing in this case is as follows:
Null Hypothesis H0 : µ1 - µ2 = θ0
(Where µ1 and µ2 are the means from two different population distributions whereas θ 0 is the
null value of µ1 - µ2).
Test Statistic:
(where are the sample means of the corresponding population distribution, s 1 and s2

are the respective sample standard deviations and m and n are the sample sizes taken from
the two populations such that m > 40 and n > 40).
Alternative Hypothesis:
(a) Ha : µ1 - µ2 < θ0 (lower-tailed test) p-value = Φ(-z)
(b) Ha : µ1 - µ2 > θ0 (upper-tailed test) p-value = 1 - Φ(z)
(c) Ha : µ1 - µ2 ≠ θ0 (two-tailed test) p-value = 2[1 - Φ(z)]
Decision rule: Reject H0 if p-value ≤ α and do not reject H0 if p-value > α.
Example 3.3 In class A of 50 students, the mean height was found to be 62.4 inches with
a standard deviation of 2.25 inches. In another class B of 50 students, the mean height was
61.5 inches whereas standard deviation was 2.5 inches. Test the hypothesis at level of
significance 0.05 and 0.01 that the students in class A are taller than the students in the
class B.
Step 1: µ1 - µ2 = difference between true average height of students in class A and class B.
Step 2: H0 : µ1 - µ2 = 0 i.e. there is no difference in the mean heights of students in the two
classes.
Step 3: Ha : µ1 - µ2 > 0 i.e. students in class A are taller than students in class B on
average.
Step 4: Test statistic:
= (62.4-61.5-0)/
= 0.9/0.476 = 1.89
Step 5: p-value = [1 - Φ(1.89)] = [1-0.9706] = 0.0294
(since the test is two-tailed)
Step 6: Since p-value is smaller than 0.05 and but greater than 0.01, therefore we reject
null hypothesis at α = 0.05 and conclude that students in class A are taller than students in
class B on average, however do not reject the null hypothesis that there is no difference in
the mean heights of students in the two classes at α = 0.01.
2.2. P-VALUES FOR SMALL SAMPLE TESTS

For large sample tests, the test statistic of population means was Z. However, in the case of
small samples, the test statistic is t with number of degrees of freedom (df) equal to n-1
and the p-value is calculated under the t-distribution curve with n-1 df, assuming that the
population is normally distributed. The three different tests are shown as below for tests of
single population means:
p-value = area to the left of calculated t (negative value)for given df = P(T ≤ -t)
where as shown in figure 5 below.
Figure 5
Alternative Hypothesis Ha : µ > µ0
p-value = area to the right of calculated t (positive value) for given df = P(T ≥ t).
See figure 6 below.

Figure 6
c) Two-tailed test
Alternative Hypothesis Ha : µ ≠ µ0
p-value = sum of the area to the left of calculated negative t value and to the right of
calculated positive t value for given df = P(T ≤ -t) + P(T ≥ t). This is shown in figure 7
given below.
(Both the probabilities will be same due to symmetry of t-distribution curve)
Figure 7
Once the p-value is calculated, it is again compared with α and following decision rule is
used:
Reject H0 if p-value ≤ α and
do not reject H0 if p-value > α

Now, the following examples illustrate the use of p-value to test population means in case of
small samples.
Example 3.4 Breaking power of 10 cables manufactured by a firm were tested which gave
a mean of 6600 lb and a standard deviation of 550 lb. The manufacturer of the firm claimed
that the mean breaking power is 7000 lb. Test this claim against alternative hypothesis that
the breaking power is less than 7000 lb at (a) 5 % and (b) 1 % level of significance,
assuming normal distribution.
Step 1: µ = true average breaking power of cables
Step 2: H0 : µ = 7000
Step 3: Ha : µ < 7000
Step 4: Test statistic: = (6600-7000)/(550/ )
= -400/174.1 = -2.30
Step 5: p-value = P(T≤ -2.30) for df equal to 9 = 0.023
(since the test is left-tailed, we look for area to left of -2.30 or to right of 2.30 in t
distribution areas table with df = 9)
Step 6: Since p-value is less than 0.05 and but greater than 0.01, therefore we reject the
null hypothesis at α = 0.05 but do not reject the null hypothesis at α = 0.01.
Tests of two population means in small samples is similar to that of large samples with the
difference that now the test statistic is t and we have the t-distribution assuming that both
populations are normally distributed, where
and
df =
(where df are rounded down to the nearest integer)
Example 3.5 The I.Q's of 20 students in a class showed a mean of 101 along with standard
deviation 11.5 whereas I.Q's of another class containing same number of students showed a

mean of 110 with standard deviation of 15.5. Test whether there is a difference between
I.Q's of the two classes using α = 0.05 and 0.01 assuming that the IQ's in both classes
follow a normal distribution.
Step 1: µ1 - µ2 = difference between true average I.Q's of students in both classes.
Step 2: H0 : µ1 - µ2 = 0 i.e. there is no difference in the mean I.Q's in two classes.
Step 3: Ha : µ1 - µ2 ≠ 0 i.e. there is a statistical difference in the mean I.Q's in two classes.
Step 4: Test statistic:
= (101-110-0)/
= -9/4.32 = -2.1
df = = 346.89/9.89 = 35.07 therefore df = 35
Step 5: p-value = P(T ≤ -2.1) + P(T ≥ 2.1) = 2(0.022) = 0.044
(since the test is two-tailed, we look for area to left of -2.1 and to right of 2.1 in t
distribution areas table with df = 35)
Step 6: Since p-value is smaller than 0.05 and but greater than 0.01, therefore we reject
null hypothesis at α = 0.05, however do not reject the null hypothesis that there is no
difference in the mean I.Q's in two classes at α = 0.01.
2.3. Pooled t-test
The pooled t-test is applied in hypothesis testing procedures whenever the two populations
of interest are normal distributed and also they have equal population variances that is
σ12 = σ22. Let σ2 be the common variance of both the populations. Then if the test statistic
for the hypothesis testing is z then it is given by the formula-
and so
Now, if the common population variance σ2 is known then the z value can be easily found by
using above formula. However, if σ 2 is unknown then we have to estimate it using the
information from the sample. If s 12 and s22 are the sample variances of the population

distribution with m observations and the population distribution with n observations

respectively then one estimator of σ2 will be the weighted average of these two sample
2
variances. This estimator, sp given below, is called the pooled or the combined estimator of
σ2 since it takes into account the difference in the sample sizes-
sp2 = ( ) s12 + ( ) s22
where (m-1) degrees of freedom are being contributed by the first sample and (n-1)
degrees of freedom are being contributed by the second sample to the estimate of σ2 . The
total degrees of freedom turns out to be (m-1)+(n-1) = m+n-2. From the statistical theory,
if σ2 is replaced by sp2, then the test statistic used will be t which will follow a t-distribution
with m+n-2 degrees of freedom. This t-statistic is called pooled t and the confidence
intervals and tests based on this t variable for the tests of difference between two
population means are called pooled t confidence intervals and pooled t test respectively.
However, it is advised to use pooled t procedure when the null hypothesis that there is no
difference between the two population variances i.e. H0: σ12 = σ22 does not get rejected. In
practice, it is recommended to use the usual two sample t procedure un till we have a very
strong reason to believe that the two normally distributed populations have equal variances
especially when the two sample sizes are not equal.
Example 3.6 Consider the previous example 3.5 with additional information that σ 12 = σ22
and suppose that in first class there are 20 students but in the second class there are 15
students. Now, test whether there is a difference between I.Q's of the two classes using α =
0.10 and 0.05 assuming that the IQ's in both classes follow a normal distribution.
The first three steps will be the same but step 4 now becomes-
Step 4:Test statistic: where m=20 and n=15.
sp2 = ( ) 11.52 + ( ) 15.52 = 178.06
This implies, t= (101-110-0)/
= -9/4.56 = -1.97 = -2.00 approx.
with df = m+n-2 = 20+15-2 = 33.

Step 5: p-value = P(T ≤ -2.00) + P(T ≥ 2.00) = 2(0.027) = 0.054
(since the test is two-tailed, we look for area to left of -2.0 and to right of 2.0 in t
distribution areas table. For df = 30 and 35, the area to right of 2.0 is given as 0.027 and so
for df = 33 we take the area to right of 2.0 equal to 0.027)
Step 6: Since p-value is smaller than 0.10 but greater than 0.05, therefore we reject null
hypothesis at α = 0.10, however do not reject the null hypothesis that there is no difference
in the mean I.Q's in two classes at α = 0.05.
3. CALCULATION OF β
As already discussed, β refers to probability of Type II error which is basically probability of

not rejecting H0 given that H0 is false. To find value of β for given values of µ, consider large
sample test of a single population mean where test statistic is z and suppose we have a
two-tailed test. In that case, the rejection areas are given by z≤-zα/2 and z≥ zα/2 which are
same as saying zα/2(s/ ) and zα/2(s/ ) respectively.
This means that H0 will not get rejected in the interval ( zα/2(s/ ), µ0 + zα/2(s/ )). Let
be the alternative value of µ then,
β( = P (not rejection H0 | µ = )
=P( zα/2(s/ )≤ ≥zα/2(s/ )|µ= )
=P( zα/2 zα/2 | µ = )
=P( zα/2 zα/2 | µ = )
=Φ( zα/2 ) - Φ ( zα/2 ).
Similarly, for right-tailed test β( =Φ( zα ) and
for left-tailed test β( =1-Φ( zα ).
Let's consider the following example:

Example 3.6 Consider a random sample consisting of 100 students who gave a statistics
test of marks 50. Let µ denote the true average marks obtained. Consider testing H0 : µ =
35 against alternative µ > 35 with a sample standard deviation of 7.5 at α = 0.05. Find
β(37).
Since, here we have a right-tailed test and µ0 = 35, , zα = z0.05 = 1.645,
β(37) = Φ ( ) = Φ (-1.02) = 0.1539
This means that, 15.39 % of the times the null hypothesis does not get rejected even
though it is false.
Now, consider large sample test but of two population means and a two-tailed test. In that
case, H0 : µ1 - µ2 = θ0 and Ha : µ1 - µ2 ≠ θ0 . Let θ' be the alternative value of µ1 - µ2 then,
the rejection areas are given by z≤-zα/2 and z≥ zα/2 which are equivalent to
zα/2(S ) and zα/2(S )respectively,
where S = .
The area in which H0 is not rejected will lie between zα/2(S ) and zα/2(S ).
β( = P (not rejecting H0 | µ1 - µ2 = θ')
= P( zα/2S zα/2S |µ1 - µ2= θ')
=P( zα/2 zα/2 |µ1-µ2= θ')
=P( |µ1-µ2= θ')
=Φ( )-Φ( ).
Similarly, for right-tailed test β( =Φ( ) and,
for left-tailed test β( =1-Φ( ).

Example 3.7 consider example 3.3 and suppose that alternate value of µ1 - µ2 = 2. At α =
0.01 find β(2 .
Here, we have a right-tailed test with zα = z0.01 = 2.33. We have already calculated
S = = 0.476.
Then β(2 is given by
β(2 = Φ ( ) = Φ (-1.87) = 0.0307.
This implies that 3.07 % of all experiments of this kind would lead to not rejecting null
hypothesis even though it is false!.
In this way β can be calculated given the alternative value of the parameter of interest.
However, the farther the alternative value of the parameter from the null value, the smaller
will be β since there will be a greater chance that the given null hypothesis gets rejected.
4. SELECTION OF A TEST
A Test has following sequential steps:
(a) Specification of the null and alternative hypothesis.
(b) Use of an appropriate test statistic along with the corresponding rejection region in
accordance with the alternative hypothesis (left-, right- or two-tailed).
(c) Using particular values of level of significance, the critical values of the test statistic are
found or p-value is calculated.
(d) Depending upon where the calculated value of test statistic falls, either null hypothesis
is rejected or not rejected.
When the population is assumed to be normally distributed with mean µ and standard
deviation σ which is known then we use the Z-test to test a given null hypothesis. In case,
the sample size is large with unknown σ, we still apply Z-test for hypothesis testing due to
Central Limit Theorem (CLT) . Thus, whenever sample size is large or σ is known of a
normal distribution, we use Z-test. However, if sample size from a normally distributed
population is small and σ is unknown then we use the t-test. And depending upon the

values of the test statistic calculated in a given test with given α, we decide whether to
reject or not to reject H0.
5. STATISTICAL VS PRACTICAL SIGNIFICANCE OF A TEST
Hypothesis Testing requires selection of parameter of interest, an appropriate test statistic,

a level of significance and then rejection or non-rejection of H0 at the given level of
significance. The same can also be done using p-value where the H0 gets rejected if p-value
is less than or equal to α. Whenever, a null hypothesis gets rejected in favour of alternative
hypothesis, we say that there is statistical significance. If, in case of large sample size,
p-value is smaller than the chosen level of significance resulting into rejection of null
hypothesis then there is little practical significance if the departure from the null
hypothesis was small or not large. However, if p-value is very small due to large departures
from the null hypothesis then there is practical significance since such departures will be
noticeable.
For example, consider a normal distribution with mean µ and standard deviation σ. If we are
testing H0 : µ = 500 against Ha : µ > 500 where suppose the true value of µ = 501. Now,
true value of µ does not show any large departure from the null hypothesis. However, if
sample size is large, then the p-value of the test might be very low indicating statistical
significance or rejection of null hypothesis even though true value of µ did not differ much
from the null value 500 practically, thereby indicating little practical significance.
To sum up, we must be very careful in carrying out interpretation of the evidence in the
case of large sample size because any minute departure from the null hypothesis will lead to
rejection of the null hypothesis even though there is little significance of such departure.
6. THE LIKELIHOOD RATIO PRINCIPLE
Suppose a random sample of size n consists of observations on a random variable X given

by x1,x2,.....xn. Let the probability distribution of X is f(x;η) with parameter η. The joint
probability distribution for sample values will be given by-
f(x1,x2,.....xn ;η) = f(x1 ;η).f(x2 ;η)..... f(xn ;η)
which is basically the likelihood function as a function of η.
Now, suppose we doing following hypothesis testing where ψ0 and ψ1 are two disjoint sets.

H0 : η Ε ψ0 i.e. η belongs to ψ0
Ha : η Ε ψ1 i.e. η belongs to ψ1
The likelihood ratio principle in case of a test consists of following steps:
Step 1: We find the maximum value of the likelihood function by finding maximum
likelihood estimate of η in ψ0.
Step 2: Then we find the maximum value of the likelihood function by finding maximum
likelihood estimate of η in ψ1.
Step 3: Consider the following ratio-
λ(x1,x2,.....xn) =
This ratio λ(x1,x2,.....xn) is known as the likelihood ratio statistic value. In this test, the
null hypothesis gets rejected if this ratio is small compared to a selected constant, say, c. In
other words, we have an evidence against the null hypothesis and in favour of alternative
hypothesis when the denominator of λ(x1,x2,.....xn) is large compared to its numerator.
The choice of c depends upon the desired probability of Type 1 error. For example, if we
have a normal distribution with known σ and if
Null Hypothesis H0 : µ ≥ µ0 (null value of µ)
then we will reject the null hypothesis if λ(x1,x2,.....xn) ≤ c which will be equivalent to |z| ≤
|c| and thus |c| = |zα|. Here, the likelihood ratio test is same as the z-test in a single
sample.
Among its advantages, this test can be applied when X's have different distributions and
also when they are not independent. However, one disadvantage of using this test is that
the functional form of the probability distribution from which the sample is derived must be
known in order to find out the value of λ(x 1,x2,.....xn). For example, to get t-test from the
likelihood ratio test we must assume that X's have a normal distribution or else there will
not be any way to write the joint probability distribution for all sample values.

PRACTICE QUESTIONS
Q1) Given the following p-values, find for which p-values the null hypothesis gets rejected
at level of significance 5 %
(i) 0.195
(ii) 0.005
(iii) 0.065
(iv) 0.049
(v) 0.025
Q2) Consider a large sample test of a single population mean. Suppose the test is right-
tailed, find the p-values associated with the following z-values:
(i) 1.96
(ii) 0.35
(iii) 2.33
(iv) 0.95
(v) 0.05
Q3) The IQ's, which follows a normal distribution, of 50 students in a certain class gave a
mean of 106 with standard deviation of 9. Test the null hypothesis that µ = 108 against the
alternative µ < 108 using p-value method at α = .10, 0.05 and 0.01. Also test the null
hypothesis that µ = 108 against the alternative µ ≠ 108 for given levels of significance.
Q4) Two types of land, 1 and 2, each having 60 plots of equal areas were selected to test
effect of a particular pesticide on rice production. The plots of land 1 were given the

treatment of the pesticide while the plots of land 2 were not. The plots of land 1 yielded a
mean output of 150.5 kg with the standard deviation of 10 kg whereas the plots of land 2
gave a mean output of 145.6 kg with the standard deviation of 1.5 kg. Check, using p-value
method, whether there is a significant improvement in the rice output due to the application
of pesticide for given levels of significance 0.01, 0.05 and 0.10.
Q5) The lifetime of a sample of 16 lights bulbs manufactured by a company were tested
which gave a mean of 1550 hours with a standard deviation of 125 hours. suppose µ is the
true average mean of lifetime of all bulbs, test the following hypothesis using p-value
method assuming normal distribution of lifetime of light bulbs-
(a) H0 : µ = 1570 against Ha : µ < 1570
(b) H0 : µ = 1570 against Ha : µ ≠ 1570
for the given the levels of significance 0.01 and 0.05.
Q6) In a particular class 40 students were selected and divided into two groups, A and B,
each having 20 students. The average height of students in group A was found to be 66.8
inches with standard deviation of 2.55 inches whereas the average heights of students in
group B was found to be 65.6 inches with standard deviation of 2.67 inches. Test whether
the true average height of students in group A is more than that of students in group B
using p-value method for α = 0.05 and 0.10 assuming that heights in both classes have a
normal distribution.
Q7) A statistic exam of 100 marks was given to students in class A and Class B. There were
20 students in class A and 25 students in class B. In class A, mean marks obtained was 72
with standard deviation of 7.5 whereas in class B, mean marks obtained was 75 with
standard deviation of 7.25. Test whether there is a significant difference between the
performance of the two classes for levels of significance 0.01, 0.05 and 0.10 using p-value
method assuming normal distribution.
Q8) Consider a random sample consisting of 100 cables manufactured by a firm. These
cables were tested for their breaking strength giving a mean of 5700 lb and a standard
deviation of 450 lb. However, the manufacturer of the firm claimed that the true average
breaking strength is 6000 lb. Consider testing this claim against alternative hypothesis that
the breaking strength is less than 6000 lb at α = 0.05. Find the probability of Type II error
given the alternative value of mean as 5800.

Q9) Consider Q4 where two types of land 1 and 2 were given treatment and non-treatment
of pesticide in case of rice production respectively. Suppose the given level of significance is
0.01 and the alternative value of µ1 - µ2 = 5, find the probability of Type II error i.e β(5)
and interpret it.
Q10) Consider a normal distribution and the following hypothesis testing:
H0 : µ = 100 against Ha : µ > 100
where the value of σ is known to be 10.
Suppose the alternative value of µ = 101. Answer the following:
(i) Is there any practical significance? Explain.
(ii) Calculate β for the given alternative value of µ for the sample sizes n = 50, 100 and
1600, given α = 0.05.
(iii) Find the p-value if the observed value of the mean i.e = 101 and the sample size is
1600. Check whether there is any statistical significance for the chosen levels of
significance. Explain.


Tests concerning population proportion and variance
Lesson: Tests Concerning Population Proportion and

Variance
Lesson Developer: Nupur Kataria
College/Department: Department Of Economics, KAMALA

Nehru College, University Of Delhi

TABLE OF CONTENTS
Section Number And Heading Page Number
1. Introduction 3
2. Tests about population proportion 3
2.1. Tests in large sample 4
2.2. Tests in small sample 10
3. Calculation of β for population proportion 14
4. Tests concerning population variance 17
4.1 p-values for tests of population variance 23
Content Developer
Nupur Kataria, Assistant Professor, Department of Economics, Kamala Nehru College,

University Of Delhi
References

1. Jay L. Devore: Probability and Statistics for Engineering and the Sciences, Cengage
Learning, 8th edition [Chapter 8 and 9].
2. Irwin Miller and Marylees Miller: Mathematical Statistics, Pearson, 7th edition.
3. Allen Webster: Probability and Statistics, 4th edition, Richard D. Irwin/McGraw-Hill, Burr
Ridge, IL, 2010
TESTS CONCERNING POPULATION PROPORTION AND

VARIANCE
Learning Objectives
This chapter demonstrates hypothesis testing procedures in case of population proportion

and variance. You will learn how to carry out these tests in small and large samples and also
in a single and two samples using both critical values as well as p-value method. Apart from
this, you will also learn to calculate probability of Type II error while testing population
proportion in case of large samples. The chapter concludes with practice questions which
will help you to test your concepts learned from this chapter.
1. INTRODUCTION
A population proportion and variance are important parameters of a population distribution.

It therefore becomes important to do hypothesis testing procedures for these parameters in
small and large samples and also in a single and two samples. This chapter discusses
various tests concerning a population proportion and variance in a single and two samples
using critical values and also p-value method. Apart from this, probability of type II error is
also discussed for population proportion in large samples.
2. TESTS ABOUT POPULATION PROPORTION
A population proportion, denoted by p, is defined as the fraction of number of successes in a

population. If a sample of size n is selected randomly and if X denote the number of
successes in that sample then an estimator of p is given by-

If the sample size n is small in comparison to the size of the population then X will follow a
Binomial distribution with mean E(X) = np and variance, Var (X)= σ2 = np(1-p). However, if
sample size is large such that both conditions - np ≥ 10 and n(1-p) ≥ 10 - are satisfied then
X and hence both will follow a normal distribution.
The estimator is an unbiased estimator of p since E( ) = p and its standard deviation is
given by .
Proof: (i) E( ) = E( ) = 1/n [E(X)] = 1/n [np] = p.
(ii) = = = = =
Now, the following sections will show the tests concerning population proportion for both
small and large samples.
2.1 TESTS IN LARGE SAMPLE
Since the sample size is large, irrespective of the fact that σ is known or unknown, the test
statistic for the population proportion p will be Z which is the standard normal variable
following a standard normal distribution. Here, both X and will follow a normal distribution.
Depending upon the alternative hypothesis, the critical values of Z or the p-values are
calculated for the observed value of test statistic Z and then this critical value is compared
with the calculated value or p-value is compared with the given level of significance α to
come to the conclusion whether to reject null hypothesis or not. However, it must be noted
that these tests are valid only when both the two conditions - np0 ≥ 10 and n(1-p0) ≥ 10 -
are satisfied given that H0 is true.
we basically consider the following three different cases:
Null Hypothesis H0 : p = p0 (null value of p)
Alternative Hypothesis Ha : p < p0

Test Statistic: Z =
(when H0 is true then E( ) = p0 and ).
Decision Rule: Reject H0 if z ≤ -zα .
p-value = area to the left of calculated z (negative value)= P(Z ≤ -z) = Φ(-z). [where Φ
stands for cumulative area to left of z]
Where we reject H0 if p-value ≤ α and do not reject H0 if p-value > α.
Figure 1(a) shows the rejection area for the z-test whereas figure 1(b) shows p-value given
by the shaded area.
Note: The level of significance α = P(Type I error) = P (Rejecting H0 when H0 is true)
Alternative Hypothesis Ha : p > p 0
Test Statistic: Z=
Decision Rule: Reject H0 if z ≥ zα .
p-value = area to the right of calculated z (positive value)= P(Z ≥ z)= 1 - Φ(z) where we
reject H0 if p-value ≤ α and do not reject H0 if p-value > α. The following figure 2(a) shows
the rejection area for the z-test whereas figure 2(b) shows p-value given by the shaded
area.

c) Two-tailed test
Alternative Hypothesis Ha : p ≠ p 0
Test Statistic: Z=
Decision Rule: Reject H0 if z ≥ zα/2 or z ≤ -zα/2 . (Figure 3(a) shows the

rejection area for a two-tailed test).
p-value = sum of the area to the left of calculated negative z value and to the right of
calculated positive z value = P(Z ≤ -z) + P(Z ≥ z) = [Φ(-z)]+ [1 - Φ(z)] = 2[1 - Φ(z)] as
shown in figure 3(b).
{since, area to left of -z is same as the area to the right of z due to symmetry of z-curve
and we reject H0 if p-value ≤ α and do not reject H0 if p-value > α}.

Now, the following examples illustrate the test procedures for population proportions in case
of large samples.
Example 1: A firm manufactured a particular medicine for curing a disease. It claimed that
the medicine was 85% effective in curing the disease within a time span of 3 days. A
random sample of 400 people was selected having this disease and it was found that the
medicine cured the disease for 320 people. Check whether the firm claim is true at level of
significance (i) 0.05 and (ii) 0.01.
Step 1: p = the true proportion or probability that the disease is cured using the medicine.
Step 2: H0 : p = 0.85
Step 3: Ha : p < 0.85
Step 4: Here, np0 = 400(0.85) = 340 > 10 and n(1-p0) = 400(0.15) = 60 >10 and so we
can apply the large sample z- test in this case.
Step 5: Test statistic: = = -0.05/0.018 = -2.77 [ where
= 320/400 = 0.8]
Step 6: (a) Critical values of z: For α = 0.05, the value of z such that area to left of it equals
0.05 is given by -1.645. Similarly for α = 0.01, the value of z such that area to left of it
equals 0.01 is given by -2.33. (since the test is left-tailed and values of z are taken from the
standard normal curve areas table).
(b) p-value = Φ (-2.77) = 0.0028.
Step 7: (a) The calculated value of z = -2.77 is less than both -2.33 and -1.645 implying
that it lies in the rejection area for both α = 0.05 and 0.01 and hence we reject the firm's
claim that the medicine is 85% effective in curing the disease at both levels of significance.
(b) The p-value is less than both 0.01 and 0.05 again implying that we reject the firm's
claim that the medicine is 85% effective in curing the disease at both levels of significance.
Example 2: Consider an experiment where 200 cars were tested for emission of a
particular toxic pollutant. If a car emitted this pollutant more than a certain desired level, it
was considered to be defective. Out of 200 cars, 45 cars were found to be defective.

However, the manufacturer of these cars claimed that the proportion of such defectives was
0.20. Test this claim against p ≠ 0.20 at α = 0.01.
Step 1: p = the true proportion of defective cars.
Step 2: H0 : p = 0.20
Step 3: Ha : p ≠ 0.20
Step 4: Here, np0 = 200(0.20) = 40 > 10 and n(1-p0) = 200(0.80) = 160 >10 and so we
can apply the large sample z- test in this case.
Step 5: Test statistic: = = 0.025/0.028 = 0.89 [ where
= 45/200 = 0.225]
Step 6: (a) Critical values of z: For α = 0.01, the value of z such that area to left of it equals
0.01/2 = 0.005 is given by -2.58 and area to the right of it equals 0.01/2 = 0.005, so that
the total rejection area equals 0.01, is given by 2.58. (since the test is two-tailed and
values of z are taken from the standard normal curve areas table).
(b) p-value = 2[1 - Φ(0.89)] = 2[1-0.8133] = 0.3734.
Step 7: (a) The calculated value of z = 0.89 is greater than -2.58 and less than 2.58
implying that it lies in the acceptance area at α = 0.01 and hence we do not reject the
manufacturer's claim that the true proportion of defective cars is 20% at α = 0.01.
(b) The p-value is greater than both 0.01 again implying that we do not reject the null
hypothesis.
Now, we move to tests concerning population proportions in case of two samples in large
samples since until now we did tests of single population proportion in large samples. The
hypothesis testing procedure in this case is as follows:
Consider two population distributions and let's denote them by Popu1 and Popu2. Let p1 and
p2 be the true fraction of successes in Popu1 and Popu2 respectively. Suppose random
samples of sizes m and n, both large, are selected from Popu1 and Popu2 respectively
independent of one another. Let X1 and X2 denote the sample number of successes for
Popu1 and Popu2 respectively. The X1 and X2 will follow a binomial distribution given that
the sample sizes, m and n, are relatively smaller as compared to the respective population

sizes. Here, the parameter of interest is the difference in the two population proportions i.e.
p1 - p2. An estimator of p1 - p2 is its sample counterpart which is = X1/m - X2/n (the
difference in the two sample proportions).
In this case, X1 ~ Bin (m, p1) and X2 ~ Bin (n, p2) where X1 and X2 are independent random
variables and E(X1) = mp1, E(X2) = np2, Var(X1)=mp1(1-p1) and Var(X2)=np2(1-p2). The
mean and variance of are given by-
Mean: E( = E(X1/m-X2/n) = 1/m[E(X1)]-1/n[E(X2)] = 1/m[mp1]-1/n[np2] = p1 - p2.
Variance: Var( = = Var(X1/m-X2/n)= Var(X1/m) + Var(X2/n) =
1/m2[Var(X1)] + 1/n2[Var(X2)] =1/m2[mp1(1-p1)] + 1/n2[np2(1-p2)] =
Since m and n are both large, X1 and X2 will approximately follow a normal distribution
implying that and hence also follows a normal distribution approximately.
The Z variable is given by the following formula:
Z= =
Null Hypothesis: H 0 : p1 - p2 = 0 i.e. there is no difference in two population proportions.
If null hypothesis is true then p1 = p2 and let's assume p1 = p2 = p which means that we are
assuming two population distributions to have a common parameter p. In this case, we
combine two random samples into one with sample size equal to m+n with total number of
sample successes equal to X1 + X2. The estimator of p is given by a weighted average of
-
[using, ]
Test Statistic: Z= = [given H0 is true].
Alternative Hypothesis Rejection area p-value

(a) Ha : p1 - p2 < 0 (lower-tailed test) z ≤ -zα p-value = Φ(-z)
(b) Ha : p1 - p2 > 0 (upper-tailed test) z ≥ zα p-value = 1 - Φ(z)
(c) Ha : p1 - p2 ≠ 0 (two-tailed test) z ≤ -zα/2 or z ≥ zα/2 p-value = 2[1 - Φ(z)]
Decision rule: Reject H0 if p-value ≤ α and do not reject H0 if p-value > α and the test is
valid if m , m(1- ), n and n(1- ) are all more than 10.
Example 3: Two groups of dogs, A and B, were formed to test impact of a particular
vaccine to cure anemia . Both the groups consisted of 250 dogs but the vaccine was given
only to group A and not to group B. In group A 190 dogs and in group B 160 dogs recovered
from the disease. Test the hypothesis that the vaccine is able to cure the disease using level
of significance (i) 0.05 and (ii) 0.01.
Step 1: p1 - p2 = difference between true population proportion in group A and group B.
Step 2: H0 : p1 - p2 = 0 i.e. there is no difference or the vaccine is not able to cure anemia.
Step 3: Ha : p1 - p2 > 0 i.e. the vaccine is able to cure the disease.
Step 4: Here, = 190/250=0.76, = 160/250=0.64. Also, m = 250(0.76)=190, m(1- )=
250(0.24)=60, n = 250(0.64)= 160 and n(1- )= 250(0.36)=90 are all more than 10 and
so we can apply the large sample z- test in this case.
Step 5: The estimator = = + = 350/500 = 0.7.
Step 6: Test statistic: = = = 0.12/0.041 = 2.93.
Step 7:(a) Critical values of z: For α = 0.05, the value of z such that area to right of it
equals 0.05 is given by 1.645. Similarly for α = 0.01, the value of z such that area to right
of it equals 0.01 is given by 2.33. (since the test is right-tailed and values of z are taken
from the standard normal curve areas table).
(b) p-value = 1 -Φ (2.93) = 1-0.9983=0.0017.

Step 8: (a) The calculated value of z = 2.93 is greater than both 2.33 and 1.645 implying
that it lies in the rejection area for both α = 0.05 and 0.01 and hence we reject the null
hypothesis and conclude that the vaccine is effective at both α = 0.05 and 0.01.
(b) The p-value is less than both 0.01 and 0.05 again implying that we reject the null
hypothesis and conclude that the vaccine is effective at both α = 0.05 and 0.01.
2.2 TESTS IN SMALL SAMPLE
We now move to tests concerning a population proportion in case of small samples. When
the sample size n is small, then the variable X (the number of sample successes) will simply
follow a Binomial distribution i.e. X ~ Bin(n,p). The null hypothesis is common to all tests
which is given by : H0 : p = p0 and tests procedure are as follows-
(a) Upper-tailed or right-tailed test
Alternative Hypothesis: H a : p > p0
Test statistic: X
Rejection area: x ≥ a (observed value of x is at least as large as the critical value a)
Given that H0 is true, X will follow binomial distribution having parameters n and p 0 i.e.
X ~ Bin(n,p0). Now,
P(Type I error) = P(Rejecting H0 when H0 is true) = P[X ≥ a given X ~ Bin(n,p0)]

= 1 - P[ X ≤ a-1 given X ~ Bin(n,p0)] = 1 - B(a-1; n,p0).
[Where B() denote cumulative binomial probability]
Since X is a discrete random variable, it becomes difficult to find value of a such that P(Type
I error) = α and therefore we use the condition [1 - B(a-1; n,p0)] ≤ α to find the critical
value a, since this condition gives the largest rejection area which include the values
(a,a+1,...,n). Then we compare this value a with the observed value of X to conclude
whether the null hypothesis is rejected or not.
(b) Lower-tailed or left-tailed test
Alternative Hypothesis: H a : p < p0
Test statistic: X

Rejection area: x ≤ a (observed value of x is at least as small as the critical value a)
X ~ Bin(n,p0). Now,
P(Type I error) = P(Rejecting H0 when H0 is true) = P[X ≤ a given X ~ Bin(n,p0)]

= B(a; n,p0).
Again, since X is a discrete random variable, it becomes difficult to find value of a such that
P(Type I error) = α and therefore we use the condition B(a; n,p0) ≤ α to find the critical
value a. Once the value a is found, we compare it with observed value of X to conclude
whether the null hypothesis is rejected or not.
c) Two-tailed test
Alternative Hypothesis: H a : p ≠ p0
Test statistic: X
Rejection area: x ≤ a1 or x ≥ a2 (observed value of x is at least as small as the critical

value a1 or it is at least as large as a2 i.e. the rejection area consists of both small and large
values of x).
X ~ Bin(n,p0). Now,
P(Type I error) = P(Rejecting H0 when H0 is true) = P[X ≤ a1 or X ≥ a2 given X ~ Bin(n,p0)]

= P[X ≤ a1 given X ~ Bin(n,p0)] + P[X ≥ a2 given X ~ Bin(n,p0)]
= B(a1; n,p0) + 1 - B(a2-1; n,p0)
Again, since X is a discrete random variable, it becomes difficult to find value of a such that
P(Type I error) = α and therefore we use the conditions B(a1; n,p0) ≤ α/2 and 1 - B(a2-1;
n,p0) ≤ α/2 to find the critical values a1 and a2. Comparing these critical values with
observed value of X, we finally conclude whether the null hypothesis is rejected or not.
Now, the following examples illustrate the testing procedures concerning population
proportion in case of small samples.
Example 4: In a particular area, two candidates A and B stood for an election. From that
particular area, it was claimed that 60% of these voters were in favour of candidate A. A

random sample consisting of 25 voters was selected which showed that 13 of these voters
voted for candidate A. Test the hypothesis p=0.60 against p<0.60 at α = 0.05.
Step 1: p = the true proportion of voters favouring candidate A.
Step 2: H0 : p = 0.60
Step 3: Ha : p < 0.60
Step 4: Test statistic: X = the number of voters favouring candidate A in the sample (since
sample size is small).
Step 5: Rejection area: x ≤ a where we have to find value of a.
Applying the condition, B(a; n,p0) ≤ α since it is a left-tailed test we get,
B(a; 25,0.60) ≤ 0.05
Using the cumulative binomial probabilities table, we have B(10; 25,0.60) = 0.034 ≤ 0.05
whereas B(11; 25,0.60) = 0.078 > 0.05. Therefore the rejection area is given by x ≤ 10.
Step 6: Conclusion- Since the observed value of x is 13 which is greater than 10 does not
fall in the rejection area therefore we do not reject the null hypothesis that 60% of the
voters favoured candidate A.
Example 5: A group of scientists developed a new type of batteries claiming that only 10%
of such batteries have a life span of less than 1800 hours. To check this claim, a random
sample of 15 such batteries was selected and it was found that 7 batteries had a life span of
less than 1800 hours. Test this claim at level of significance 0.01.
Step 1: p = the true proportion of batteries having a life span of more than 1800 hours.
Step 2: H0 : p = 0.10 i.e. 10% of such batteries have a life span of less than 1800 hours.
Step 3: Ha : p > 0.10 i.e. more than10% of such batteries have a life span of less than 1800
hours.
Step 4: Test statistic: X = the sample number of batteries having a life span of less than
1800 hours (since sample size is small).
Step 5: Rejection area: x ≥ a where we have to find value of a.
Applying the condition, [1 - B(a-1; n,p0)] ≤ α since it is a right-tailed test we get,

1 - B(a-1; 15,0.10) ≤ 0.01
Using the cumulative binomial probabilities table, we have [1-B(5; 15,0.10)] = 0.002 ≤
0.01 whereas B(4; 15,0.1) = 0.013 > 0.01. Therefore we get (a-1)=5 implying a=6 and so
the rejection area is given by x ≥6.
Step 6: Conclusion- Since the observed value of x is 7 which is greater than 6 does fall in
the rejection area therefore we reject the null hypothesis that 10% of such batteries have a
life span of less than 1800 hours.
The small sample tests in case of difference between two population proportions are rather
difficult as compared to their large sample tests. One such test is called Fisher-Irvin test
which is based on hyper-geometric distribution.
3. CALCULATION OF β FOR POPULATION PROPORTION
This section presents calculation of probability of Type II error, denoted by β, for population
proportion in case of large samples. To find value of β for given values of p, consider large
sample test of a single population proportion where test statistic is z and suppose we have a
two-tailed test. In that case, the rejection areas are given by z≤-zα/2 and z≥ zα/2 which are
equivalent to -
≤ -zα/2 and ≥ zα/2 respectively,
Simplifying the expressions give, ≤ p0 - zα/2 and ≥ p0 +
zα/2 respectively.
This means that H0 will not get rejected in the interval (p0-zα/2 ,
p0+ zα/2 ). Let be the alternative value of p then,
β( = P (not rejecting H0 | p = )
= P (p0 - zα/2 ≤ ≤ p0+ zα/2 |p = )
= P (p0- - zα/2 ≤ ≤ p0 - + zα/2 |p = )
[subtracting from all sides]

=P( ≤ ≤ |p = )
[dividing entire expression by where z = given H0 is true]
=P( ≤ ≤ )
=Φ( )-Φ( ).
[where Φ stands for cumulative probability under standard normal curve]
Similarly, for right-tailed test where Ha : p > p0 -
β( = Φ( ) since rejection area is z≥ zα, &
for left-tailed test where Ha : p < p0 -
β( =1-Φ( ), the rejection area being z≤ -zα.
Example 6: Consider example 1 where H0 : p = 0.85 and Ha : p < 0.85. Suppose that the
medicine is only 80% effective in curing the disease, find the probability of Type II error at
level of significance 5%.
In this case, we have a left-tailed test and n = 400, p0 = 0.85, , zα = z0.05 = 1.645
then,
β( =1-Φ( )=1-Φ( )
= 1 - Φ(0.021/0.02) = 1 - Φ(1.05) = 1 - 0.8531 = 0.1469. [Using standard normal curve

area table]
This means that, 14.69 % of the times the null hypothesis does not get rejected even
though it is false.

Now, consider large sample test but of two population proportion and again a two-tailed
test. In that case, H0 : p1 - p2 = 0 and Ha : p1 - p2 ≠ 0. The rejection areas are given by z≤-
zα/2 and z≥ zα/2 which are equivalent to -
≤-zα/2 and ≥ zα/2 respectively,
On simplifying, we get- ≤-zα/2 and
≥ zα/2 respectively where both m and n are large.
Let p1 - p2 be the alternative value then the probability of Type II error will be a function of
p1 and p2 and is given by-
β( = P (not rejecting H0 when H0 is false )
= P (-zα/2 ≤ ≤ zα/2 |False H0)
Here, Z= = under a false H0 since now p1 - p2 ≠ 0
Hence, we have-
β( = P[ ≤ ≤ ]
=P[ ≤ ≤ ]
Since m and n are both large,
= ≈

=Φ( )-Φ( ).
[where Φ stands for cumulative probability under standard normal curve]
Similarly, for right-tailed test where Ha : p1-p2 > 0 -
β( = Φ( ) since rejection area is z≥ zα, &
for left-tailed test where Ha : p1-p2 < 0 -
β( =1-Φ( ), the rejection area being z≤ -zα.
Example 7: A test on statistics was given in two classes, A and B, each containing 100
students. Let p1 and p2 be the true proportion of students who passed the test in class A
and B respectively. Suppose p1 = 0.9 and p2 = 0.75. Consider the test where H0: p1-p2 = 0
against Ha: p1-p2 > 0. Find the value of β if it is given that (p1-p2) = 0.1 at α = 0.01.
In this case, we have a right-tailed test and m = n = 100, p1 = 0.9, p2 = 0.75, zα = z0.01 =
2.33 then
= = = 165/200 = 0.825
β( = Φ( ) = Φ( )
= Φ (0.025/0.053) = Φ (0.47) = 0.6808. [Using standard normal curve area table]
This means that, 68.08 % of the times we end up not rejecting the null hypothesis even
though it is false!.
4. TESTS CONCERNING POPULATION VARIANCE

A variance is another important parameter/ characteristic of a population distribution which

describes the dispersion in the data. A population variance is denoted by σ 2 whereas its
sample counterpart, called sample variance, is denoted by s2 and is given by the formula-
s2 =
where Xi is ith observation on random variable X in the sample, is the sample mean and n
is the sample size.
The square root of population variance and sample variance are called population standard
deviation (denoted by σ) and sample standard deviation (denoted by s) respectively. The
testing procedures concerning a population variance is based on the null hypothesis that the
population distribution has a particular value of variance, where the population has a normal
distribution. In such a case, the test statistic is called chi-squared statistic given by-
χ2 = =
which follows a chi-squared distribution with degrees of freedom (d.f) equal to (n-1).
A chi-squared distribution, unlike a normal distribution, is not symmetric. In fact it is

positively skewed and it takes positive values, as can be seen from the formula above. But
as n increases or its degrees of freedom increases it becomes more symmetrical as shown
in following figure 4 below-
Let χ2α, (n-1) and χ2(1-α), (n-1) denote the critical values of chi-squared variable such that the
area to right of it under chi-squared distribution with (n-1) d.f is α and (1-α) respectively.
Now, since the chi-squared distribution is not symmetric χ2α, (n-1) will not be same as χ2(1-α),
(n-1) as shown in following figure 5. In fact χ2α, (n-1) will be greater than χ2(1-α), (n-1).

For example, using the chi-squared distribution table, we can see that for α = 0.01 and n =
25, χ20.01, 24 =42.980 (99th percentile)and χ2(0.99), 24 = 10.856 (1st percentile). So, the tests
concerning a single population variance for a normal distribution are as follows:
Null Hypothesis: H0: σ2 = σ20
Test statistic: χ2 = under true Null Hypothesis
follows chi-squared distribution with d.f = (n-1).
Alternative Hypothesis Rejection Area
Ha : σ2 < σ20 (left-tailed test) χ2 ≤ χ2(1-α), (n-1)
Ha : σ2 > σ20 (right-tailed test) χ2 ≥ χ2α, (n-1)
Ha : σ2 ≠ σ20 (two-tailed test) χ2 ≤ χ2(1-α/2), (n-1) or χ2 ≥ χ2α/2, (n-1)
Following figure 6 shows the rejection areas for the above mentioned tests.

Example 8: A certain firm tested weights of its randomly selected twenty six machines. The
sample mean and variance of weights were found to be 155 kg and 16.6 kg 2 respectively.
Test the claim that the variance of the weights of machines in the firm is 15 kg2 against the
alternative that it is more than 15 at α = 0.05.
Step 1: σ2 = the true variance of the weights of machines in the firm.
Step 2: H0: σ2 = 15
Step 3: Ha : σ2 > 15 (right-tailed test)
Step 4: Test statistic: χ2 = = = 415/15 = 27.67.
Step 5: Critical value of χ2 = χ2α, (n-1) = χ20.05, 25 = 37.652 [using the chi-squared
2
distribution table] and rejection area is χ ≥ χ20.05, 25.
Step 6: Conclusion: Since the calculated value of the χ2 = 27.67 < χ20.05, 25 = 37.652 (the
critical value), it does not fall in the rejection area and hence we do not reject the null
hypothesis that the true variance of the weights of machines in the firm is 15 kg2 at α=0.05.
Now, let's consider two normal population distributions, Popu1 and Popu2, with variances
σ12 and σ22 respectively. Suppose a random sample is selected from both these populations
and let m and n denote the sample sizes and s12 and s22 denote sample variances of Popu1
and Popu2 respectively. The tests concerning two population variances, where population
distributions are normal, are based upon the null hypothesis that the two population
distributions have same variances. The test statistic in this case is F-statistic which is a ratio
of two chi-squared random variables. Let χ12 and χ22 be two chi-squared random variables
with d.f say d.f1 and d.f2, then F-statistic is given by-
F= ~ Fd.f1,d.f2
which follows an F-distribution with numerator degrees of freedom d.f1 and denominator
degrees of freedom d.f2.
In Hypothesis testing procedures, F-statistic is the ratio of chi-squared random variables-

(m-1)s12/σ12 and (n-1)s22/σ22- with d.f equal to (m-1) and (n-1) respectively. In other
words, the F-statistic is given by-

F= =
which follows an F-distribution with numerator degrees of freedom denoted by (m-1) and
denominator degrees of freedom denoted by (n-1).
Just like a chi-squared distribution, the F-distribution is also asymmetric or is positively

skewed and has two parameters which are numerator degrees of freedom denoted by (m-1)
and denominator degrees of freedom denoted by (n-1). This means that the critical values
of F, denoted by Fα, (m-1),(n-1) and F(1-α), (m-1),(n-1) will be different, as shown in following figure
7. Also, since F-statistic involves ratio of two population and sample variances, its value will
always be greater than zero.
The critical value F(1-α), (m-1),(n-1) can be calculated using Fα, (m-1),(n-1) by following formula-
F(1-α), (m-1),(n-1) = 1/ Fα,(n-1),(m-1).
We use F-distribution table to find critical values of F. For example, the critical value of F
when α = 0.01 and (m-1) = 25, (n-1) = 30 is F0.01,25,30 = 2.45 whereas F(1-0.01),25,30 =
F0.99,25,30 = 1/ F0.01,30,25 = 1/2.54 = 0.39 using above formula.
The tests concerning a two population variances for a normal distribution are as follows:
Null Hypothesis: H0: σ12 = σ22
Test statistic: F=

since under Null Hypothesis σ12 = σ22 and F follows an F-distribution with numerator degrees
of freedom (m-1) and denominator degrees of freedom (n-1).
Alternative Hypothesis Rejection Area
Ha : σ12 < σ22 (left-tailed test) F ≤ F(1-α), (m-1),(n-1)
Ha : σ12 > σ22 (right-tailed test) F ≥ Fα, (m-1),(n-1)
Ha : σ12 ≠ σ22 (two-tailed test) F ≤ F(1-α/2), (m-1),(n-1) or F ≥ Fα/2, (m-1),(n-1)
The figure 8 shows the rejection areas for the above mentioned tests.
The following example illustrates the use of F-distribution in case of testing procedures
concerning two population variances.
Example 9: Consider two classes A and B containing 16 and 25 students respectively. The
mean heights of students in both classes were computed and it was found that there was no
significant difference in their mean heights. However, the sample standard deviations in
class A and B were 9 inches and 12 inches respectively. Check whether class B has a higher
variability in heights than class A for α = 0.01.
Step 1: σ12 - σ22 = the difference between true variances of the heights in two classes A
and B.
Step 2: H0: σ12 = σ22 i.e. both classes have same variances of heights.

Step 3: Ha : σ12 < σ22 (left-tailed test) i.e. class B has a higher variability in heights than
class A.
Step 4: Test statistic: F = = 92/122 = 81/144 = 0.5625.
Step 5: Critical value of F = F(1-α), (m-1),(n-1) = F(1-0.01), (16-1),(25-1) = F0.99, 15,24. We can find this
value using the formula- F0.99, 15,24 = 1/ F0.01, 24,15 = 1/3.29 = 0.3039. [using the F-
distribution table] and rejection area is F ≤ F0.99, 15,24.
Step 6: Conclusion: Since the calculated value of the F =0.5625 > F0.99, 15,24 = 0.3039 (the
critical value), it does not fall in the rejection area and hence we do not reject the null
hypothesis that both the classes have same variances of heights at α=0.01.
4.1 P-VALUES FOR TESTS OF POPULATION VARIANCES
The p-value is the observed or the actual level of significance at which we reject the null
hypothesis. In particular, we compare p-value with α to check at what levels of significance
null hypothesis is rejected and at what levels it is not. In other words, we follow this
decision rule: reject the null hypothesis if p-value ≤α and do not reject null hypothesis if p-
value > α.
Consider two normal population distributions and a right-tailed F-test where the numerator
and denominator degrees of freedom are (m-1) and (n-1) respectively and Ha : σ12 > σ22. In
this case, the p-value is given by the area to the right of calculated value of F under F-
distribution curve. For example, if (m-1) = 10, (n-1) = 11 and calculated F-value = 2.85
then p-value = area to right of 2.85 under F-distribution curve. Using the F-distribution
table for numerator and denominator d.f 10 and 11 respectively, we see that this area is
0.05 and so p-value = 0.05 in this case. However, here the given value of calculated F
matched the value in the table but what if this value is does not match. In that case we
have following procedure.
Considering again a right-tailed F-test. For numerator and denominator d.f 10 and 11
respectively, the F-distribution table gives the critical values of F for different α as below-

α Critical value of F
0.10 2.25
0.05 2.85
0.01 4.54
0.001 7.92
If the calculated F-value is 2.01 then p-value will be area to right of 2.01 and since in the
table the area right of 2.25 is 0.10, this means that area to right to 2.01 will be greater
than 0.10. Hence, p-value > 0.10. Similarly, if calculated F-value is 2.45 which lies between
2.25 and 2.85 implies that the p-value lies between 0.05 and 0.10. Again, if calculated F-
value is 3.50 which lies between 2.85 and 4.54 implies that the p-value lies between 0.01
and 0.05. If calculated F-value is 6.70 then the p-value lies between 0.01 and 0.001. Lastly,
if calculated F-value is 8.88 which is greater than 7.92 then the p-value < 0.001.
After finding p-value, we compare it with given level of α and conclude whether or not to
reject null hypothesis. For example if 0.05< p-value < 0.10 then we reject the null
hypothesis at α = 0.10 but do not reject it when α = 0.05. Similarly if p-value < 0.001 then
we reject H0 at α = 0.01.
Consider now a two-tailed F-test with (m-1)=10 and (n-1)=11. In this case p-value is the
twice the area to right of larger calculated F-value or twice the area to the left of smaller
calculated F-value under F-distribution curve. In this case, we first find the p-value of say
the larger calculated F-value and then multiply it by 2. For example, if calculated F-value is
6.70 then the p-value here will be - 2(0.001) < p-value < 2(0.01) which gives 0.002 < p-
value < 0.02 and so the null hypothesis gets rejected at α = 0.05 but we cannot say
whether it will get rejected or not at α = 0.01 since we do not know whether p-value is
more or less than 0.01. However, for a given F-test various statistical software gives the
exact p-value and hence we can check the rejection of H0 in that case.
Lastly, for a left-tailed F-test we find the area to the left of the calculated F-value to get p-
value. However, in this case we have to find the left-tailed critical value of F which has to be
computed using the formula F(1-α), (m-1),(n-1) = 1/ Fα,(n-1),(m-1). After which we can easily find p-
value and compare it with given level of α.
PRACTICE QUESTIONS

Q1) A manufacturer supplied fax machines to a particular industry and claimed that 90% of
such machines were in good conditions. To check his claim, the industry took a sample of
250 fax machines and found that 30 machines were defective. Test the claim at α = 0.05
and 0.01.
Q2) Consider example 2. Test the manufacturer's claim p = 0.20 against the alternative that
p > 0.20 at α = 0.05 and 0.01.
Q3) An examination on English was taken in two classes 1 and 2, each having 150 students.
It was found that 120 students in class 1 and 130 students in class2 passed the exam. Test
the hypothesis that the performance of students in class 2 was better than that of class 1 at
α = 0.05 and 0.01.
Q4) Consider example 5 and test the given claim against the alternative that p ≠ 0.10 at α
= 0.05 and 0.01.
Q5) A book store purchased 20 copies of a statistics book. It was claimed that 95% of these
books got sold. Up on testing, it was found that 17 copies were sold. Test the claim at α =
0.05 and 0.01.
Q6) In question 1, it was claimed that 90% of fax machines were in good conditions.
Suppose that the alternative value of p is 0.85, find probability of Type II error at α = 0.05
and 0.01.
Q7) Find value of β in case of example 7 by taking the alternative value of (p 1-p2) as 0.20 at
α = 0.05 and 0.01.
Q8) The life span of certain light bulbs supplied by a company gave a standard deviation of
100 hours. A random sample of 15 such bulbs were tested and it was revealed that the
standard deviation was 105 hours. Test the hypothesis that the standard deviation is
significantly different from 100 hours at α = 0.05 and 0.01.
Q9) A case study was done to rate the soft drink Coke. Two groups, A and B, of 20 people
each were selected to rate Coke on a scale of 0 to 10. The group A gave a standard
deviation of 10 while group B gave a standard deviation of 8. Test whether group A has a
greater variability in rating than group B at α = 0.05 and 0.01.
Q10) Find p-value for the following cases and check whether H 0 gets rejected at α = 0.05
and 0.01.

(i) Left-tailed F-test where numerator d.f = 9, denominator d.f = 12 and calculated F-value
=0.45.
(ii) Right-tailed F-test where numerator d.f = 15, denominator d.f = 7 and calculated F-
value =5.50.
(iii) Two-tailed F-test where numerator d.f = 20, denominator d.f = 15 and calculated F-
value =6.70.

Budget Constraint
Lesson: Budget Constraint
Lesson developers: Vaishali Kapoor And Rakhi Arora
College/ Department: Delhi University

Budget Constraint
Table of Contents:
1. Introduction
2. Budget Constraint
2.1 Budget Constraint equation
2.2Drawing Budget Constraint
2.3 Working with slope of budget line algebraically
2.4 What alters budget line?
2.4.1 Budget line pivot
2.4.2 Parallel shift in budget line
2.4.3 Non- parallel shift in budget line
2.4.4 Schemes conditioned on quantity of commodity
3. Summary
4. Exercises
5. Glossary
6. Appendix
7. References

Budget Constraint
Learning Outcomes:
After studying this chapter, a student should be able to:-
1. Sketch budget constraint.

2. Interpret slope of budget constraint.
3. Shift/ pivot budget constraint.
4. Compare budget sets when consumer’s context (income and prices) differs.
5. Understand impact of government’s actions on budget set of a consumer.

Budget Constraint
1. Introduction
A consumer has to go market, he faces different prices, choose among different
alternatives, he chooses the best bundle he likes, etc. But, wait! What makes him a
consumer? Or from where does he start? Yes, he has some money income at hand to spend
which then follows the pursuit.
Two different persons have difference in incomes; this could alter their tax liability and
hence disposable incomes differ too. These two persons, if residing at different locations,
could face different prices. A consumer with lower level of income could be more happier if
his income commands greater amount of goods & this is possible when prices are lower (
relatively) compared to that faced by other person .
Different permutations & combinations of prices & income will be tried & tested in this
chapter. This chapter basically provides you with one of the apparatuses used in consumer
theory in microeconomics and is called ‘budget constraint’. This chapter is divided into four
sections and further into subsections. First section covers budget constraint algebraically &
in second, it is analyzed graphically. In third section, slope of budget line is calculated &
interpreted. In the last section, changes in budget set are analyzed via prices changes,
income change, taxes, and subsidies & rationing.
2. Budget Constraint
“Budget”! This word is often heard when you ask your parents for some expensive toy,
latest version of smart phone or when you ask for some trip. The reply that you get is ‘It is
not in our budget, this time!”. So, one has basic understanding that, one can’t consume
infinitely any amount of the goods. There is a binding constraint and as above example
makes it clear it is ‘your parents’ Income’; beyond which things become unaffordable.
In this section, we will understand feasible set of goods that can be consumed, given
somebody’s income. Some common notations and concepts would be used, which are
explained below:-
1. There is money income , M, given to you

2. There are two goods: x & y. Even if we take two goods it would not change our
analysis. Instead, of y you could also assume all other goods on the y – axis.
3. The prices of x & y are Px & Py respectively.
4. (x1,y1) represents a bundle of two goods x & y when a consumer consumes: x 1 of
good x and y1 of goods y .

Budget Constraint
2.1 Budget constraint equation

If there are two goods x & y and their respective prices are Px & Py ; then total
expenditure made on two goods is :
Px . x + py . y
This expenditure need be less than or equal to one’s money income M, so budget set
becomes:
Px .x + py. y M …………..(1)
If one wishes to spend entire income on the two goods, then above equation has to satisfied
with equality & budget constraint / line is
Px .x+py . y = M …………..(2)
If there are three goods then, budget constraint becomes:
Px . x + py . y +pz . z=M
There could be n goods & one can extend it to n – goods case.1
2.2 Drawing Budget Constraint

The budget constraint Px .x+py .y = M should be plotted by asking two questions:-
i) How much x a consumer can buy if he spends entire income on x? Answer. Since
y = 0,so x=M/px
ii) How Much y a consumer can buy if he spend entire income on y? Answer, Since
x= 0, y = M/py .
So, vertical intercept is M/py & horizontal intercept is M/Px, & budget line is a line
joining these two points: ( 0 ,M/py) & (M/px,0)
1
See the appendix to this chapter to understand n goods case.

Budget Constraint
Figure 1: Budget Set
The shaded region in fig.1 is budget set including the line since all consumption bundles like
(x1,y1), (x2,y2) and (x3,y3) are affordable at prices ( px,py ) and with money income, M. In
three goods case, budget constraint Px x+py y+pz z=M; which is a budget plane that can
be drawn in 3 D.

Budget Constraint
Figure 2: Budget Plane
2.3 Working with slope of Budget line algebraically

Rewriting budget line in the form y =mx+c,where ‘m’ used to be slope of the line & c,
intercept.
y= - x1 …………..(3)
2
So the slope of budget line is- .The negative sign shows downward slope of budget
line.Let (x1, y1 ) be a affordable bundle on budget line then :-
P x x1+py y1 =M …………..(4)
Suppose, consumer changes bundle but satisfies budget constraint then:
Px (x1 +Δx)+ py(y1+Δy)=M …………..(5)
2
See appendix to this chapter for calculus treatment of slope of budget line.

Budget Constraint
On subtracting equation (4) from (5), we get:
Px Δx+py Δy=0
Or, Δy/Δx=-px/py …………..(6)
The negative sign signifies the fact that to be on the same budget line, one has to make
changes in such a manner that consumption of x & y moves in opposite directions i.e, if a
consumer increases consumption of good x then consumption of y must fall (since income is
constant; one can’t afford increase in both commodities, simultaneously).
Slope of Budget line is price ratio of the goods or the relative price of one good is terms of
other good. is interpreted as by how much consumption of y changes if consumption of x

increases by 1 i.e. Δx = 1.It is equivalent to -px/py. So, a consumer is willing to substitute
good x for good y at the rate of -px/py. Rewriting (6), we get:
Δy/Δx=-(1/py/px)
A consumer has to give up less of y to add one unit of x if either Py is higher or Px is lower
or both i.e. the relative price of x in terms of y is lower. This means if you give up some
expensive commodity like a laptop (represents “y” here), you can add more of cheaper
commodity like bread (represents ”x” here). Put it other way, you have to sacrifice a smaller
fraction of laptop (not literally) to add a unit of bread loaf.
2.4 What alters budget line?

For any line, a change can be made if either intercepts changes or slope change or both.
Budget line either shifts parallely or non-parallely or it pivots.
2.4. 1 Budget line pivot

A budget line is called pivoted when slope of budget line changes & purchasing power stays
constant.

Budget Constraint
Figure 3: Budget Line pivot around y-axis
Budget line pivots to the right from GH to GH1(GH2) if price of good x falls(increases) from
px to ( ). This fall in price of x allows for increased (decreased) consumption of x while
keeping maximum possible consumption of y unchanged. The budget line GH1 (GH2) is
flatter( steeper) than GH.3
Budget line pivot (like in fig.3) occurs in following cases:-
a. Market price of x increases (falls) from P x to ( )

b. Good x is taxed & hence its price increases as
A good can be taxed either by levying a tax on per unit of good consumed or by
levying tax on the value of good. If a tax is levied on quantity of x consumed then it
is called quantity tax. Thus = Px + t, where ‘t’ is tax per unit x. If a tax is levied
on value (price) of good x then it is called ad valorem tax. Thus = px(1+t),
where Px is given to the supplier & Pxt is tax collected by government.
c. A Subsidy on good x if announced then consumer faces a lower price. Like taxes,
subsidy could be either quantity subsidy or ad valorem subsidy. In quantity subsidy
case, =Px-s, where “s” is per unit subsidy. In ad valorem subsidy case, =Px(1-
s).
Point to remember is that a consumer has greater budget set when price of good falls &
vice-a – versa.
3
To remember slope assume you are on x-axis at points H,H1 or H2 you have to drive up to
point G. Ask yourself, when it most difficult ? Your answer must be when you are at point
H2, so it is a steep road and flattest is x-axis itself.

Budget Constraint
2.4.2 Parallel shift is Budget line

When the slope (which is given by given by ratio of prices) of Budget line is assumed
constant than budget line shifts parallely.4
Figure 4: Parallel shifts in budget line
Parallel shift in budget line is brought about in following cases:-
a) When Money income rises( falls) from M toM1(M2) then budget line shifts from GH
to GH1(G2H2)
b) When a lump sum tax5 is changed from consumer, of the amount T (=M-M1) then
budget line shifts from GH to G1H1.
When a lump sum subsidy of amount S (=M2-M)is given to a consumer then budget
line shifts from GH to G2H2.
c) When prices change proportionately i.e. the ratio P x/py is held constant then also
budget line shifts parallel. If Px and Py increase (decrease), then consumer can afford
less (more) of both goods, then budget line shifts from GH to G1H1(G2H2).
2.4.3 Non-Parallel shifts in Budget line

To understand non-parallel shifts one should understand that there is both change of slope
and change of both intercepts. A non – parallel rightward shift of budget line from GH to
G1H1 implies a flatter budget line compared to GH.
Slope of parallel lines is same.

4
5
In lump sum case, a fixed amount is taken away from income irrespective of consumption
bundle or prices.

Budget Constraint
Figure 5 : Non-parallel shifts in budget line
G1H1 is flatter implies a lower slope. Let GH’s slope be -Px/py. A lowering of slope means
either Px falls or Py increases or both.6 Now, lets analyze cases in which budget constraint
shifts non-parallely rightward:-
a. Px & Py changes disproportionately.
When Px & Py both fall but disproportionately, then more of both goods could be
consumed. If Px/py falls, then change in budget line to G1H1 is depicted in panel A; else if
px/py increases it is depicted is panel B of Fig.5.
b. Px & M Changes
For a rightward shift, M must increase and for slope to lower down px must fall which is
shown by shift from GH to G1H1 in panel A of Fig 5. Otherwise, if Py falls budget line shift
is as shown in Panel B of Fig.5.
c. Py & M Changes
If M increases along with fall in Py, then budget constraint shift outward indicating
greater quantities of both goods & additional increase in y as its price has fallen would
mean a steeper budget line shifted outward as depicted in panel B of fig.5.
6
While changing slope take absolute of –px/py and then analyze steepness or flatness.

Budget Constraint
SOFT AND HARD BUDGET CONSTRAINT
When we say income of the consumer is given, he can purchase only out of his
income; we are referring to ‘hard’ budget constraint. Softening of the budget
constraint means that consumer can spend more than his income due to
paternalistic role of the government. This sort of softening is relevant not for
consumer households but also for private firms, NGOs and other economic
organizations.
Also, apart from this support from government is at all times, is expected by
consumer and consumer now behave taking this support into account.
d. Px & Py changes disproportionately & M changes.
Assume case ‘a’ & you reach G1H1 from GH in a panel of Fig.5 due to disproportionate
change in prices.(1) Assume then, you have extra money income with you which shifts
G1H1 to G2H2 ,(2). If your some money income would have been taken back from your to
for the fall in prices your budget line would shift parallel leftward to G3H3, which is non-
parallel rightward shift of budget line of GH.
For working out, leftward shift reverse all the cases. Start from G1H1 & arrive at GH.
2.4.4. Schemes conditioned on/after certain quantity of

commodity
a) Rationing7 Constraint
A constraint could be binding or not. A constraint is said to be binding if could alter your
feasible set. Suppose in any economy good x can’t be consumed more than quality x1 by
any individual. Then, a constraint is binding if M/px>x1 & Consumer’s budget line is lopped
off, as shown is panel A of fig.5. Else wise, if M /px<x1, then budget set is unaltered as
shown in panel B of figure 6.
7
Rationing is any method of allowing a scare product or service other than by price
mechanism.

Budget Constraint
Figure 6: rationing Constraints
b) Rationing, taxes & subsidies

If suppose a good is taxed after x1 units of x are consumed then budget line will have slope
up to x1 and then since x is taxed Px rises & slope changes and budget line becomes
steeper thereafter. So there is a link at x1. This scenario is depicted is following figure.
Figure 7: Effect of tax or subsidy, upto x1

Budget Constraint
After x1, since x is costlier you have to give up more of y to get one unit of x. In case of
subsidy, subsidy is given up to x1 change in budget line would be same just – with this
assumption, that px1 is original price and px is subsidized price.
c) Few free units of Good x to all

Suppose that government wants everyone to consume at least 4 kgs. of wheat (good x
here). So, to ensure nutritional security, government announces zero price of wheat up to 4
units, after which market price, Px is charged. So, now budget line change from GH to GFH.
Figure 8: ‘4’ units of x free / costless
d) Buy three get one free

There are schemes where in 4 lux soaps are bundled together & it is labeled that ‘3+1
free’. In such cases, budget line again changes, which is shown in fig 8. There is a
discontinuous portion since as you consume three units of x, you have one extra of good &
nothing in between is allowed to consume.

Budget Constraint
Figure 9: Buy ‘3’ get ‘1’ free
e) Cash Back Offer

Again there are schemes, in which you show wrapper(s) of good x & you some cash in
return. It is much like kids collecting points in funflips. Suppose you show 5 wrappers of
good x and you get back px 8i.e. price of one unit of x. It is like increase in income after 5
units of x.It is like increase in income after 5 units of x.
Figure 10: Cash- back offer
Summary
 Budget set is the set of all consumption bundles that are affordable at the ongoing
prices in the market; given a consumer’s income.
8
It could be of any amount and need not only be equivalent to Px.

Budget Constraint
 Slope of the budget constraint is negative showing that a consumer has to substitute
one good for another. The rate at which consumer is willing to substitute good y for
good x is relative price of good x to good y and is denoted by -px/py.
 If price of either or both good falls then consumer can consume more as purchasing
power increases and hence, budget set enlarges due to fall in price(s). Price changes
can change the slope of budget constraint unless both goods’ prices changes
proportionately.
 Similarly, if income rises consumer has enlarged budget set. But change in income
alone does not change the slope of budget constraint.
 A tax on commodity is viewed by consumer as price rise and subsidy as price fall. So
taxes and subsidy treats budget constraint in the same manner as price change.
Lump sum tax or subsidy is like decrease or increase in income respectively and
hence alters budget constraint like income change.
 Various marketing schemes and government’s rationing schemes could also change
consumer’s budget constraint.
Exercises
Q1. There was a consumer Abel who resided in a country where people only
consumed Pepsi and Burger. The price of Pepsi was $1.5 per bottle and price of
Burger was $ 2 per unit. Abel’s income was $ 30.
a. Draw Abel’s budget constraint. Properly label the axis.

b. Suppose he consumes 8 Pepsis and 9 Burgers. Mark such a consumption bundle
and check whether is it feasible?
c. Suppose now Pepsi is taxed at the rate of $0.5 per bottle. Show graphically and
interpret the impact.
d. If had it been the case that he consumed 8 bottles of Pepsi despite of tax then,
his tax would have sumed to $4. Assume instead of tax per bottle, $4 has been
taken away from Abel’s income and prices remain ($1.5,$2). Compare this tax
with earlier sort of tax and its impact on Abel’s budget set.
Q2. In a food stamp program in a country, food coupons upto 5 kgs. of grain
amounting to Rs.250 are given for Rs. 100. After a consumer consumes this limit,
there are no coupons for him and he pays market price for the grain. Assume all
other goods on the other axis and its price Rs. 1. Draw a person’s budget constraint
before and after the food stamp program, if his income is Rs. 1000.
Q3. What happens to the budget constraint in following cases :
a. When both prices and income doubles.

b. Price of good x doubles and that of good y triples.
c. Income of consumer doubles and price of good x is halved.
Q4. Rewrite budget constraint in following cases:
a. Government announces lump sum tax’ T’, quantity tax on good x of ‘t’ and
quantity subsidy on good y of ‘s’.

Budget Constraint
b. Price of good x doubles, the price of good y becomes four times larger and
income become eight times larger.
Glossary
 Budget set: Budget set is the set of all consumption bundles that are affordable
at the ongoing prices in the market; given a consumer’s income.
 Budget constraint: Budget constraint is a line showing (locus of) all
affordable bundles at which entire income is spent.
 Budget line pivot: A budget line is called pivoted when slope of budget line
changes & purchasing power stays constant.
 Lump sum tax(subsidy): In lump sum case, a fixed amount is taken away
from income irrespective of consumption bundle or prices.
 Quantity tax(subsidy): . If a tax is levied on quantity of x consumed then it
is called quantity tax. Thus = Px + t, where ‘t’ is tax per unit x.
 Ad valerom tax(subsidy): If a tax is levied on value (price) of good x then
it is called ad valorem tax.
 Rationing : Rationing is any method of allowing a scare product or service other
than by price mechanism.
Appendix
A.1 Budget constraint for n-goods case

Let, P be price vector of order n.
X be goods quantity sector of order n.
Dot product of P & X gives total expenditure which is equated to money income M on budget
constraint:
PX=M
A.2 slope of Budget Line

Budget line is given by: Px .x+py . y = M
Take total differential of above equation, and we get
Px Δx+py Δy=0
On rearranging we get,
Δy/Δx=-px/py

Budget Constraint
References:
Hal R. Varian, Intermediate Microeconomics: A Modern Approach, W.W. Norton and
Company/Affiliated East-West Press (India), 8th edition, 2010.
C. Snyder and W. Nicholson, Fundamentals of Microeconomics, Cengage Learning (India),

2010.

Preferences And Indifference Curves
Lesson: Preferences And Indifference Curves
Lesson Developers: Vaishali Kapoor And Rakhi Arora
Institute of Lifelong Learning, Delhi University

Table of Contents:
1. Introduction
2. Preferences
2.1 Axioms of Preferences
2.1.1 Complete
2.1.2 Reflexive
2.1.3 Transitivity
3. Utility
3.1 assigning utility
3.2 Positive Monotonic transformation
3.3 total and marginal utilities- one good case
4. Indifference Analysis
4.1 Well behaved Indifference curve
4.2 Properties of Indifference curves
4.2.1 Negatively sloped
4.2.2 Thinner lines
4.2.3 Convexity
4.2.4 Indifference curves don’t intersect
4.3 Well behaved preferences and Indifference curves
4.4 satiation point
4.5 Marginal Rate of Substitution
4.6 Special cases of preferences and their indifference curves
4.6.1 Neutral good
4.6.2 Perfect substitutes
4.6.3 Perfect Compliments
4.6.4 Cobb Douglas Preferences
4.6.5 Discrete goods
5. Summary
6. Exercises
7. Glossary
8. Appendix
9. References
Learning Outcomes:
1. Explain rational behavior of a consumer.

2. State axioms of preferences.
3. Define utility function.
4. List properties of Indifference curve.
5. Sketch indifference curve map for every preference.
6. Calculate and interpret slope of indifference curves.
7. Define satiation point.

1. Introduction
Consumers are rational decision makers in the sense they know their preference over goods
and buy goods that get them maximum satisfaction. In this chapter we will assume
consumers to be rational and study preferences and utility derived from various bundles of
commodities. Only difference between actual consumer in the market and our study is that
here we would have calculus applied to the situations consumer faces in the market.
This chapter is divided into three sections. First section covers axioms of preferences. In the
second section, utility function is derived. In the last section, indifference curves are
detailed at length.
2. Preferences
If I ask you, what would you like to have in dinner: fried rice or chowmien? I would receive
three of these answers: (a) fried rice, (b) chowmein, (c) either of these. Irrespective of your
answer choice, I would conclude something about your preferences either you like fried rice
over chowmein or chowmein over fried rice or like both equally. As in this example
commodities are ordered/ranked so could be bundles of commodities.
2.1 Axiom of preferences

While ranking of commodity bundles, consumer should, however follow some logical
reasoning, and we can say a consumer is rational and consumer’s preferences are
consistent. Following are listed few of the assumption about consumer preferences which
are called axioms of consumer theory.
Let us understand the notation used for it. ’>’ symbol is used when one bundle is strictly
preferred over other.’~’ symbol is used when two bundles give equal level of satisfaction to
the consumer. When consumer prefer or is indifferent between two bundles over other and
≥ symbol is used to compare two same bundles.
2.1.1 .Complete
Axiom of completeness states that two bundles can be compared. Assume two bundles 1
and 2; both comprising of gods x and goods y.
(a) Either (x1,y1) would be preferred over (x2,y2) : (x1,y1)≥(x2,y2)

(b) Or (x2,y2) would be preferred over (x1,y1) : (x2,y2)≥(x1,y1)
(c) Or both, which means that you consumer is indifferent between two bundles.

2.1.2 Reflexive
If a bundle is as good as itself then it follows reflexive. For example: a cold drink bottle on
left is as good as on the right. Both the bundles contain one cold drink bottle. For kid, this
sort of assumption could seen invalid but not for adult, at least.
2.1.3 Transitivity
If bundle 1 is preferred over bundle 2 is preferred over bundle 3. Suppose you prefer
studying in India over US and US over Australia, then it seen that given all three choices,
you must choose India
This assumption help in forming best bundle choice.
Example 1
Consider the following binary relation defined over where x is set of human beings. Check if
each of these relations satisfies reflexivity, completeness and transitivity.
i). At least as tall as
Assume three individual; A,B,C
A could be taller than, shorter than or of same height as of, B.
A is at least as tall as B could be used in first & third instance or B is at least as tall as a
can be used is second & third instance. Since, this relation helps is comparing A & B. This
relation is complete.
A is at Least as tall as a also makes sense hence this relation is reflexive.
If A is at least as tall as B, & B is at least as tall as C then A is at least as tall as C. Even if

three would of same height this statement would still hold. Hence, this relation is
“transitive.
ii) ‘Taller Than’
A could be taller than, shorter than or of same height as of, B.
This relation is not complete since is third case of same height this relation last be used. It
is not reflexive since A can’t be taller than her/herself.
If A is taller than B & B is taller than C then it follows that A is taller than C. So this
relation is transitive.

3. Utility
In the last section, consumer preferences were
Did You Know??
discussed, where in at a time two consumption
Utility is a concept that was
bundles were compared & ordered (ranked).
Indifferent, strictly preferred and weakly preferred,
introduced by Daniel
all are binary relationships defined over bundles of Bernoulli. He believed that
commodities. It would be easy if we cold assign for the usual person, utility
numeric values to the bundles in such a manner increased with wealth but at
that preserve the ordering of bundles.1 Preserving a decreasing rate.
the order of consumption bundles, refer to
assigning higher value to preferred bundle than
less-preferred bundles.
3.1 Assigning Utility

It is a function mapping of a consumption of a bundle to a unique, real number representing
utility desired from that bundle.
U: (x,y) →
So, utility function is a real valued function
If a bundle (x1,y1) is strictly preferred over (x2,y2) then u (x,1y1)>U(x2,y2)
There are two set of theories/approaches to measure utility.
a) Ordinal Utility
As the name suggests, ordinal is, only order mattes. Utility can be assigned to
different consumption bundles irrespective of the magnitude, as long as the ordering
of preference is maintained for a particular consumer. For ex: A consumer prefers
Bundle 1 to bundle 2 then, Utility assigned to bundle 1 can be 1, 10 or 1000
provided that every time bundle 2 has “7” and bundle 3 has 90 , then it could be
inferred that bundle 3 is most preferred bundle, nest to it is bindle 1 and least
preferred one is bundle 2 . But it cannot be inferred from the above information
that bundle 3 is 9 times better bundle than bundle 1.
b) Cardinal Utility
There are economists who hold that magnitude of utility is of significance. This is the
known as cardinal utility assignment to consumption bundles.
If is above example, it cold be said with precision that utility of bundle 1 is 10 and of
bundle 3 is 90 : then it means consumer like bundle 3 times be drawn if consumer is
ready to pay nine times the price of bundle 2, for bundle 3.
1
Here axiom of transitivity should be followed for consistency of values assigned.

3.2 Positive Monotonic Transformation

If g is some increasing function (like g(x) =x2 or x2 in x or ex, Then g of utility function
would not after the order of preferences though the magnitude would so, while dealing with
ordinal utility function, a positive monistic transformation would not change the analysis.
If a bundle (x1,y1) is preferred over (x 2,y2) then u(x1,y1)>u(x2,y2), if g is positively

increasing function then g(u(x1,y1) )>g(u(x2,y2) ) where g(u(.) ) is positive monotonic
transformation of u. This monotonic transformation of a utility function represents same
preferences as the original utility function.
Example:
Bundle\Utility First way U1 Second way Positive Negative

U2 monotonic monotonic
transformation transformation
U1*2 U1*(-2)
1 9 3000 18 -18
2 10 4000 20 -20
3 11 5000 22 -22
In column 2, bundles are given utility in two digit column 3 bundle 1-3 are assigned utilities
in thousand but order of preference is bundle 3,2 and 1.U 1 and U2 are ordinal utilities where
preference of order is important. In column 4, positive monotonic transformation of U1
function is computed by doubling U1. This does not alter the order of preferences of bundle
1-3.But a negative monotonic transformation represented in column 5 by (-2 U1), reverse
the order of preference. Least preferred bundle 1 becomes most preferred; as -18 is greater
than -20 and -22.
3.3 Total and marginal utilities-one good case
When more of good is consumed ,total utilities goes on increasing .Ask yourself, would you
prefer two pair of shoes or one ?obviously , your answer would be two. But, when you buy
first shoe its utility is highest became that one is first one in your wardrobe. Another one

Adam Smith presented classical paradox called ‘ Water-Diamond’. In statement of this
paradox he proposes, that, water which is more valuable but is priced less than diamond.
He could not resolve this paradox.
Later, the marginal-utility theory of value resolved the paradox. Water in total is much
more valuable than diamonds in total because the first few units of water are necessary
for life itself. But, because water is plentiful and diamonds are scarce, the marginal value
of a pound of diamonds exceeds the marginal value of a pound of water. It can be
assumed that for water we are at a point on MU curve at higher quantity and for
diamonds we are still at a point with less of Diamond (i.e. commodity on x-axis) is
consumed.
have some utilities but obviously less than first one and so on .These two argument are
consistent since former is in relation to total utility and latter to marginal utility. Total utility
always increase with increase in number of units consumed. While addition made to this
total utility falls.
Marginal utility=∆U/∆x= U(x2) -U(x1) =MU
x2-x1
If x2>x1, then U(x2)>U(x1), hence marginal utility is positive. Though it falls if x2 increases
to x32 that is MU’ is smaller than MU measure, where MU’ is
given by
MU'=U(x3)-U(x2)
x3-x2
4. Indifference Curve Analysis
There could be various consumption bundles amongst which a consumer is indifferent .For
example two serving of rice with three chapattis could give same satisfaction as three
serving of rice with two chapattis .All such a combination give same fixed level of utility to
2
Please see appendix to this chapter for the calculus treatment of it.

consumer. Indifference (referring to bundles for which a consumer is indifferent) Curve is

locus of all such consumption bundles which yield same level of satisfaction/utility.
Indifferent curves are also known as Iso-utility curve.
4.1 Well Behaved Indifference Curve:

Let us consider following arbitrary indifferent curve.
Figure1 Assumed Indifference curve
Let us assume that the above curve is locus of all consumption bundles at which consumer
has some constant utility throughout. There are upward sloping and downward sloping
sections3 of our assumed indifferent curve. A positively sloped indifferent curve would mean
that more of both commodities like at point B and less of both commodities like at point C
Figure2 positively sloped indifference curve
3
BC ,DE are upward sloping section and AB,CD, and downward sloping section of indifference curve here.

both commodities; give same level of utility to consumer. But is it consistent? No, strictly
more of both commodities should add utility for the consumer.4
So, now our indifferent cure would not be upward sloping. We can now have only indifferent
curve which is downward sloping .A downward sloping indifferent curve mean that as we
move along the indifference curve from point C to point D (in figure 3) would mean more of
goods x with less of goods y. So good y is sacrificed so that addition made to utility from
extra consumption of x, is equivalent to loss of utility due to reduction of y. CDE is
negatively sloped 5 but with D as point of inflexion6.
CD section of assumed indifference curve is convex to the origin while DE section is concave
to the origin.
Figure3 negatively sloped indifference curve with point of inflexion
Now, next have to examine whether indifference cure need be convex or concave or could
be both.Let us examine concave section DE, first. In figure 4, on x-axis change in units of x
is assumed to be 1 unit and change in y is due to change in x.
4
Though addition made would be declining (due to diminishing marginal utility), but both add to positive utilities.
Unless either good becomes ‘bad’, this is discussed in sections to come.
5
You can check its slope as negative or downward by drawing tangent at various points.
6
Point of inflexion is a point at a curve from where the curvature of a curve changes.

Figure4 concave indifference curve
To increase consumption of x by 1 unit, change in y (along DE) goes on increasing. But from
our knowledge of marginal utility, marginal utility of a good is high at lower levels of
consumption and low at higher levels of consumption. So as we move from point 1 to 2 and
further towards E ; loss of marginal utility from reduced y is greater than additional of
marginal utility from increased x. The reason for the same being y is lost at greater pace
than addition made to x .So, indifference curve cannot be concave7.
Indifference curve, so assumed in figure 1 had a thicker section GHI reconstructed here. It
can have bundles 1 and 2 and many such bundles. Here bundle 2 contains more of both
goods compared to bundle 1 and hence have higher utility.8 So, assumption that these
points are on same indifference curve is violated.
Figure5 Thicker section of indifference curve
7
Concave indifference curve can be observed and will be explained in coming sections.
8
This property is known as assumption of monotonicity.

So, indifference curve sections excluded from our assumed shape of indifference curve are
(1) upward sloping

(2) concave to the origin
(3) Thicker line.
4.2 Properties of indifference curves
4.2.1 Negatively sloped:

Indifference curve is negatively sloped as utility addition due to increase in one good should
be compensated by decrease in other good.
4.2.2 Thinner lines :

Indifference curve should be thin curves as thicker lines would have areas of strict
preference.
4.2.3 Convexity:
Indifference curve must be convex. Convexity would imply that fall in y- accompanied with
increase in x- should get smaller as more of x is added. Look at the following two diagrams
and the explanation that follows.

Figure6 (a) total utility curve, (b) Indifference curve
At point 1 on Indifference curve I0 ,x1 of x and y1 of y is consumed that yields utility of level
I0. Panel B of figure 6 shows corresponding utility achieved from these consumptions. When
one moves to point A, consumption of y decrease to y 2 and of x increases to x2.This leads to
increase in utility from TU to TU1 so MU1 is added. To compensate, there should be greater
fall in y (more than one unit) since fall in one unit of y decreases utility by MU 3(which is less
than MU1). So, y must fall by two units (since MU3 + MU2 = MU1). Further, if x is added then
TU3 is achieved and change is MU2 but now y must fall by less than a unit since again fall in
utility due to y (MU1) would be greater than rise in utility from x (MU2). A change in y
should be brought such that loss in utility equals MU2.
4.2.4 Indifference curves do not intersect

Indifference curves should not intersect so as to retain axiom of transitivity. Let us consider
that indifference curve intersect as shown in figure7.
Figure7 intersecting indifference curves

Point C compared to point A yields more of utility as described by monotonicity. A consumer

is indifferent between A and B (being on same indifference curve I0) and also, indifferent
between B and C (being on same indifference curve I1). So by transitivity consumer should
be indifferent between A and C which is not the case. Hence, axiom of transitivity in
violated.
4.3 Well behaved preferences and Indifference curves

The assumption of monotonicity assumes that more of both the goods are better. Hence,
higher the indifference come higher is the utility as is shown in fig. 8. The shaded area in
fig 9, shows weakly preferred set to (x1,y1).
Figure 8 Monotonicity Figure 9 weakly preferred set
The second assumption for well behaved preferences is that averages are preferred to
extremes. Let (x1, y1) & (x2, y2) be on the same indifference curve in fig. 10. Then average
of these two extreme boundless lie above the curve. And, this average would be in a weakly
preferred set iff indifference curve is convex. Consider other (non convex) cases in panel b
& c of fig 10.

Figure 10
4.4 Satiation point

The level of consumption beyond which utility attached to an extra unit of the good is zero
or negative is known as satiation level or saturation level. Assume two goods, chapatti and
vitamin B tablets. Up to a limit, both are needed and add to utility. Say for individuals 6
chapattis and two tablets are required for healthy living, beyond this level utility falls. Let
us denote such respective levels by x1 in fig. 11( it is done for sake brevity).
Figure11 total utility declines after x1

Figure 12 satiation point
The arrows in this figure, show chapatti direction of maximal increase.
Highest level of utility is achieved is at ( 6,2) which is known as bliss point ( fig.12)
Indifference curve is downward sloping in quadrant ‘I’ showing that one good has to be
sacrificed to add extra unit of good to keep utility constant
Figure13 both are ‘good’ Figure 14 x is ‘bad’
In ‘II’, Good x (chapatti) becomes ‘bad’ i.e. add to disutility (you can think beyond this he
can have digestive order). But good on y – axis, tablets is yet a ‘good’. Indifference curve
is reproduced again in fig14. The disutility from addition of x is to be compensated by utility
from good y. Indifference curve I2, denotes higher utility than I1 and I0. At I2, y1 level of y
is combined with smaller level of bad x compared to Io, hence I2 shows greater utility.

Alternatively, with x 2 level of bad x higher level of good y is combined at I2 indifference

curve.
For Scooby and Shaggy facing moster is a bad…. So to keep them at same utility velma needs
to offer Scooby Snax(good).
In ‘III’, good on y – axis becomes ‘bad’ & good x is yet ‘good’. The indifference curves are
reproduced in fig.15.
Figure15 y is ’bad’

In ‘IV’, both goods become ‘bad’. Indifference curves are negatively sloped but direction of
maximal increase is towards origin & indifference curves are concave to the origin as shown
in fig 16. The indifference curves are circles with satiation point at the centre.
Figure16 both are ‘bad’
4.5 Marginal Rate of substitution
The slope of indifference curve9 is known as marginal rate of substitution. The slope
indicates change in y when x changes by 1 unit. For indifference curve, it would mean that y
is substituted for x in a way that utility is held constant. And, as we saw in earlier sections;
at smaller levels of y, smaller sacrifice would be made as loss in utility from sacrificing 1
unit is high and at greater levels of y, larger sacrifice could be made. Hence, marginal rate
of substitution is diminishing10 along the indifference curve (assuming convexity and
monotonocity). Figure17 shows Δy is declining11 for 1 unit addition in good x.
9
Slope of indifference is different at at different points of indifference curve.Only for a straight line slope is
constant.
10
It is the abosulate number which is reffered here.
11
Also see appendix to this chapter for calculus treatment of it.

Figure17 diminishing MRS
4.6 Special cases of preferences & their indifference curves

In this section, we would cover various forms of utility functions and analyze marginal rate
of substitution of their respective indifference curves.
4.6.1 Neutral Good

A good is said to be neutral if any quantity of a good can be added & no change in utility is
brought about. Assuming good y is neutral & good x is good then indifference curves are
vertical and utility increases with increase in consumption of x only. Example of such a good
( neutral ) could be that consumer does not care about how many hours T.V runs in his
room but he cares how many books he get to read.

Figure18 good on y axis is neutral
4.6.2Perfect substitutes
Two goods are substitutes if the consumer is willing to substitute one good for another at a
constant rate. Constant marginal rate of substitution means that indifference curves are
straight lines.
One ‘Ten rupee bank note’ is perfect substitute of two ‘five rupee bank note’. If ten rupee
note is denoted by y & five rupee note by x: then utility U = x + 2y.
Figure19

The trick to remember that what should be coefficient of x & y is utility function is: since, 1
unit of y gives twice the utility compared to 1 unit of x. Likewise , if ‘a’ units of y gives same
utility as ‘b’ units of x then utility function becomes: U = ax+by and MRS =-a /b
4.6.3 Perfect Compliments

Perfect complements are goods that are consumed together in fixed proportions. Best
example quoted for this is left shoes & right shoes. Bundle (1,1) gives same utility as (2,1)
or (1,2) because extra of either of these would not be useful & hence, no utility is gained.
Figure20 perfect compliments
Indifference curves for perfect complements are hence L–shaped with kink at 45˚ line
(because of 1:1 ratio). Slope at vertical portions is infinity as Δx (denominator) is zero &
slope at horizontal portion is zero as Δy (numerator) is zero. If two teaspoons of sugar is
added to one cup of tea then indifference curves are as in panel A of fig 21. For a general
case, where ‘a’ units of x are consumed with & ‘b’ units of y indifference curves are as
shown in panel B of fig.21.
Figure 21: (a) 2 tps of sugar is taken with 1 cup of tea; (b) ‘a’ units of x is taken with ‘b’ units
of y
The utility function is written as follows:-
U = min{x, y}; if one unit of good y is consumed with 1 unit of x.
U = min {2y,x};for sugar (x) & tea (y) example.
& U = Min { bx, ay}; if ‘a’ units of x are consumed with ‘b’ units of y.
In the form U(x1,x2)=Min.{x,y} utility would be minimum of two levels of consumption of

goods. Consider shoes example:
Utility, U=1=min[1,1] =min[2,1] =min.[1,2]

Cobb Douglas utility function has convex shaped indifference curves. Utility function takes
the form: U(x,y)=xα yβ .The indifference curves appear like in figure22.
Figure22 Cobb Douglas preferences
Marginal rate of substitution of x for y is given by:
MRS=-MUx/MUy =-axa-1yb/bxayb-1=-ay/bx
4.6.4 Quasi linear preferences

Quasi linear, as the name suggests utility function is linear but in one good and non-liin
other. Let utility function be linear in good y and non-linear in good x. then, utility function:

U(x,y) = v(x) + y. Here utility is equal to the height of indifference curve along the y-axis
i.e. when x is zero, utility is equal to consumption of y.
Figure23 Quasi linear preferences
The indifference curves are just vertically shifted versions of one indifference curve.
4.6.5 Discrete goods

All the cases considered till now assumed that goods are continuously divisible. There are
many goods like milk, rice, wheat that can be consumed in fraction. But there are goods like
automobiles, smart-phones, TV sets which are discrete in consumption.
Figure24 discrete goods

The dashed lines connect indifferent bundles (though consumption at other than dots are
not possible). Strong lines show weakly preferred to (x 1,y1)
Summary:
(1)Three axioms of preferences viz. completeness, reflexivity and transitivity are made
about the consistency of consumers’ preferences.
(2)Utility function is a mapping of consumption bundles to their respective real values.

While ranking bundles, order of preferences only matters and this approach is known as
ordinal utility. Positive monotonic transformation of utility function is always consistent with
original ranking of bundles.
(3) Properties of indifference curves are as follows:
(a) Negatively sloped, (b) convex to the origin, (c) curves are thin, (d) indifference curves
do not intersect and (e) higher the indifference curves higher the utility.
(4) Beyond a point /level of consumption of any good, disutility is generated i.e. too much
of a good or negative utility. At this level, utility is maximum and such a point is known as
satiation point.
(5)Well behaved preferences exhibit declining marginal rate of substitution i.e. sacrifice of y
each time when x is increased; keeps on declining.
(6) Indifference curves that are convex (but do not general smooth curve with declining
MRS) are for perfect substitutes, perfect compliments, and quasi-linear preferences and
even for discrete goods.
(6)Goods can be good, bad or neutral. If a good is good then it adds to utility when its
consumption is increased else add to disutility (negative utility) if bad and zero utility if
neutral.

Exercises
Q1. Sumit has Cobb –Douglas preferences and his utility function is given by U(x,y) =xy.
State true or false about the following statements considering sumit’s preferences:
a) (10,5)~ (5,10)
b) (20,5) ≥ (4,25)
c) (15,4) ≥ (7.5,7.5)
Q2. The marginal rate of substitution for sumit with utility function is U(x,y)=xy; is given
by:
a) x/y
b) y/x
c) x2/y2
d) y2/x2
Q3. Match the utility functions with their MRS at (x,y)=(4,9):
i) x+y a) 1.5
ii) x+√y b) 1
iii) x0.5y0.5 c) 6
Q4. If the utility function is U(x,y) = x2-y2. Then what can you conclude about nature of
these goods:
a) x and y both are good
b) x is good and y is bad
c) x is good nad y is neutral
d) x is bad and y is good
Q5. Which of the following pairs of goods are compliments or substitutes:
a) popular novel and Magazine.
b) A camera and a film.
Q6. a) Consider a utility function U(x,y) = √xy. Calculate its MRS.

b) Is V(x,y) = x2y2 monotonic transformation of U?
c) Calculate MRS of V(x,y). Is it different from MRS calculated in part (a)?
Q7.If both good x and y are bads, draw and explain consumer’s indifference curves.
Q8. a) Consider a utility function of Vaibhav as U(x,y)= 4√x + y. Vaibhav consumes 9

units of x and 15 units of y. His consumption of x is reduced to 4 units, but is given
sufficient of good y that there is no loss (or no addition to) of utility. How many units of y
does Vaibhav consume?
b) Draw indifference curves showing vaibhav’s preferences.
c) Calculate Vaibhav’s MRS. Is his preferences homothetic?
Q9. Poonam’s utility function is U(x,y) = max{x,y}.
a) Draw her indifference curves.
b) If Poonam consumes(10,20),Calculate her utility at this bundle.
c) State true or false about Poonam’s preferences:
i) (10,20)~(20,10)
ii) (20,10)>(15,15)
d) Is Poonam’s preference convex?
Q10. Consider utility functions:
a) U(x,y)= xy
b) U(x,y)= x2+y
Show that both of these has diminishing marginal rate of substitution.
Glossary
 Indifference curve: A curve showing the locus of combinations of the amounts of
two goods such the consumer is indifferent between any combinations on that curve.
 Marginal rate of substitution: It refers to the amount of one good that is required
to compensate the consumer for giving up an amount of another good such that the
consumer has same level of utility as before.
 Perfect substitutes: Two goods are substitutes if the consumer is willing to

substitute one good for another at a constant rate.
 Perfect compliments: Perfect complements are goods that are consumed together
in fixed proportions.

 Bad good: A good is said to be bad if addition in its quantity creates disutility even
at lower levels of consumption.
 Neutral good: A good is said to be neutral if any quantity of a good can be added &
no change in utility is brought about.
 Monotonicity: Monotonicity principle is ‘More is better’ and implicitly takes good-

good and leaves bad-giood.
 Well behaved preferences: preferences are well behaved when indifference curves
are negatively sloped and convex to the origin.
Appendix
A.1 Diminishing marginal utility

Given utility function U(x), du/dx is positive stating that ‘good’ adds value to its consumer.
Fall in marginal utility as x increases; refer to the downward slope of marginal utility curve.
This is computed by d2u/dx2. d2u/dx2 should be negative not du/dx to fulfill diminishing
marginal utility. MU in first quadrant assumes all positive values and but its slope is
negative, due to negative relationship between addition made to utility and consumption of
units.
Figure A1 diminishing marginal utility
A.2 Indifference curves are level curves of utility function

We assume two good x & Y on the x & y – axis respectively. Various combinations of these
goods like A and B yields some utility which is represented on Z-axis. Combination is raised
to the height equivalent to utility of bundle A shown by point A'. Similarly; for all such
bundles, utility curve is drawn (as shown in fig. A2).

FigureA2 utility function
At various heights say α, a disc is seen (with point A' on it). Assume, this disc drops down
on x y plane; which then appears like in fig.A3. C' is the Bliss point where utility is
maximum. C' corresponds to point C in xy plane, which is satiation point.
FigureA3 indifference curves as level curves of utility function
A.3 Diminishing Marginal Rate of Substitution

Utility depends on consumption of good x & good y.
U = f (x,y)
Taking total differential of it, we get :

ΔU= Δx+ Δy
ΔU=MUx Δx +MUy Δy
On an indifference curve, there is no change is level of utility. Therefore,
MUx Δx+MUy Δy=0
Or, Δy/Δx=-MUx/MUy
As x increases MUx falls and as y declines MUy increases. Both of which imply that this
fraction falls as x increases. Hence we witness diminishing marginal rate of substitution
along the indifference curve.
References:
Hal R. Varian, Intermediate Microeconomics: A Modern Approach, W.W. Norton and Company/Affiliated
East-West Press (India), 8th edition, 2010.
C. Snyder and W. Nicholson, Fundamentals of Microeconomics, Cengage Learning (India), 2010.
www.wikipedia.org

Consumer Optimization
Lesson: Consumer Optimization
Lesson Developers: Vaishali Kapoor And Rakhi Arora

Table of Contents:
1. Introduction
2. Optimization
2.1case of well behaved preferences
2.1.1 Diagrammatic treatment
2.1.2 Algebraic expression
2.2 Interior solution and boundary optimums
2.2.1 Kinked preferences
2.2.3Nuetrals and bads
2.2.4 Concave preferences
3. Demand
3.1 Well behaved preferences
3.2 Perfect substitutes
3.3 perfect compliments
3.4 Giffen goods
4. Engel’s curve
4.1 Well behaved Indifference curve
4.2 inferior goods
4.3 perfect substitutes
4.4 Perfect compliments
4.5 Cobb Douglas Preferences
4.6 Homothetic Preferences
4.7 Quasi linear preferences
5. Summary
6. Exercises
7. Glossary
8. Appendix
9. References

Learning Outcomes:
i) State condition for attaining optimal choice bundle.

ii) Calculate optimal consumption bundle given information about preferences, income
and prices.
iii) Analyze impact of change in price on quantity demanded of a good.
iv) Derive demand curve for a good for different kind of preferences.
v) Define & distinguish between normal good and giffen goods.
vi) Analyze impact of change in income on quantity demanded of a good.
vii) Derive engel curve for a good for different kind of preferences.
viii) Distinguish between normal good and inferior goods.
1. Introduction
Can you recall when you were given pocket money at the age of 7 or 8! You always knew
how to utilize that money. Either kids at that age would spend on cola, ice-cream, or
whatever toys one wanted. But I ‘m sure you must have chosen whatever must have
brought you joy and satisfaction. You didn’t know what optimization was, what
microeconomics technique to be applied and what conditions were to be met. But you were
genius who did optimization subconsciously and actually every consumer does.
In this chapter, we will deal with optimization formally. This chapter is divided into three
main sections. First section covers optimization’s conditions for various preferences. In
second section, demand curve of a good for a consumer is desired. In last section, impact
of change in income on optimal quantity of good is analyzed.
2. Optimization
Optimization in context of utility would mean maximizing utility given the budget set. Given
any level of income, M and prices & good x & y as p X & pY a consumer maximizes his utility
by choosing a consumption bundle that gives him highest satisfaction. This bundle choice is
dependent on consumer’s preferences. It is obvious to assume that consumer is happier
consuming good x (relatively to good y), then his optimal consumption bundle would have
more of good x. But, this consumer’s choice is also affected by price of good x in the
market and is constrained by his income. Hence, optimal choice is decided by nexus
between budget set and preferences.
2.1 Optimization in case of well behaved preferences.

Preferences are well behaved if indifference curves are negatively sloped, and are convex to
the origin. This section analyses ‘optimal choice bundle’.
2.1.1 Diagrammatic treatment to it.

Budget line is locus of all consumption bundles which are affordable when entire income is
consumed. Indifference map shows indifference curves of varying utility. Indifference

curve is locus of all consumption bundles which yield some constant level of utility. Budget
set and indifference map have same x-axis and y – axis labeling as x-good and y-good,
respectively. Lt us super impose indifference curves on budget set, like in
Figure1
A Few affordable bundles given some income, ‘M’ are marked in above figure. Point A,B and
C yield utility U0 and likewise points F,G, H yield utility U1 & point E yield U21.
Amongst all such affordable bundles, the point that maximized utility is point E that gives
utility U2. There are two remarkable things that point ‘E’:-
i) It lies on the budget line

Optimal point lies on the budget line. Points B & G yield utility U0& U1 and hence
utility could still rise till E is reached. Any point above E (like D) is unaffordable.
ii) Point of tangency of indifference curve with Budget line
If optimal point has to be on budget line, then all points like A,C, F,H,& E all are such
points but optimal is only E. Highest possible achievable indifference curve is I2.
2.1.2 Algebraic expression for optimal Bundle.

As discussed in last section, optimal bundle choice requires tangency of indifference curve
with budget line. This implies that slope of budget line equals slope of indifference curve,
which is given as:
= MRSxy=
1
Assumed here that U2>U1>U 0and ‘0’,’1’&’2’ subscripts create correspondence between indifference curve with
their respective utility level.

Slope of budget line measures that rate at which market is willing to substitute good y for
good x. The above equation implies that rate of substitution in a market should be equal to
marginal rate of substitution of two good by a consumer.
2.1.3 Lagrangian Technique of utility maximization2

The objective function is the utility function and constraint here is budget. The problem can
be written as follows:-
Max : U(x,y)
Subject to: pxx+pyy=M
Lagrange, ℓ=U(x,y)-λ(px x+py y-M)
where λ is Lagrange multiplier.
For optimization put
- λ Px =0 …..1
- λ Py=0 ….. 2
-(px x +py-M)=0 …… 3
On solving equation 1 and 2, we get:
Also ,it would be written as:
2
optional
3
λ is lagrange multiplier and here it becomes ratio of befits to cost. Additional benefit from each good is
MU and cost is its price. So, condition implies that marginal benefit to cost ratio must be equal for all
goods.

Gossen's laws, named for Hermann Heinrich Gossen (1810 – 1858), are three
laws of economics:
Gossen's First Law is the “law” of diminishing marginal utility: that marginal
utilities are diminishing across the ranges relevant to decision-making.
Gossen's Second Law, which presumes that utility is at least weakly quantified,
is that in equilibrium an agent will allocate expenditures so that the ratio of
marginal utility to price (marginal cost of acquisition) is equal across
all goods and services.
where
 is utility
 is quantity of the -th good or service
 is the price of the -th good or service
Gossen's Third Law is that scarcity is a precondition for economic value.

Source: Wikipedia
Where, U’x is marginal utility of x and U’y is marginal utility of y. It is same equilibrium
condition required for optimal choice bundle, which was attained in last section. Equation 3
implies that this choice bundle (x,y) must end up entire income.
2.2 Interior solution and boundary optimums

Lagrangian technique and equality of slope can only be applied when indifference curves are
smooth(without kinks) and are convex .In case of kinks ,though interior solution can be
obtained but this calculus does not work. In other than these cases, even boundary points

act as solution to optimization problem. But technique of calculus is of no use in such cases.
We will analyze them in this section.
2.2.1 Kinked preferences

In case of kinked preference, optimal point would be where indifference curve’s kink
touches the budget line. Like in figure 2,point E is optimal point.
Figure 2
When two goods are perfect complements, then indifference curves are L shaped.
Indifference curve for perfect complements also have kink.
Figure 3
Optimal bundle for perfect complements indifference curve is where kink touches the budget
line like at point E in figure 3 .Since at kink slope cannot be calculated so there has to be
alternate method to complete optimal bundle.

Consider a consumer’s preferences that ‘a’ units of x are consumed with ‘b’ units of y .The
indifference curves would appear as in figure 4.The kinks would be on line OA whose slope
is b/a .Origin is also one point and one indifference curve is an L at origin (that is x axis
and y axis itself is an indifference curve).
Figure 4
Optimal point is hence at intersection of line OA and budget line. Budget line is given by
pxx+py y=M and line OA’s equation is y=(b/a) x. Solving these two equations yield :

figure 5

Indifference curves for perfect substitutes are straight lines. Either of the three is possible:
I. Indifference curves are steeper than budget line.

II. Indifference curves are flatter than budget line.
III. Indifference curves are parallel to budget line and one of these overlap budget line.
These cases are depicted in respective panels of figure 6.
Figure 6
Case 1 Indifference curves are steeper than budget line:
This would mean that consumer is willing to substitute y for good x at greater pace .This
mean consumers values good x more than good y. since two goods can be substituted
easily (perfectly) and optimal choice would be at point E1 in panel (1) of figure 6;where
consumer consumes all x and zero units of good y.( ,0) is boundary optimum.
Case 2 Indifference curves are flatter than budget line:
This case is just reverse of above discussed case. It is depicted in panel (2) of figure 6 and
in such case consumer consumes all y and nothing of good x. (0, ) is boundary optimum.
Case 3 Indifference curves slope equivalent to that of budget line:
In such a case, one indifference curve overlap budget line and hence all points starting from
(0, ) and in between and including ( ,0) are optimal.

Let us write down demand function of x when goods x and y perfect substitutes as follows:
{
when >
0 when <
X=
2.2.3 Neutrals and Bads

Let good x be good and good y be neutral, then highest possible achievable indifference
curve is where M/Px units of x are consumed as depicted in figure 7 and zero units of good y
which is neutral good.
Figure 7
Now let good y be bad and good x as good. Then consumer has highest utility when bad y is
not consumed. This is depicted in figure 8.

Figure 8
Demand for good x =M/Px
For y =0(in either case of neutral or bad commodity)
In both the cases, boundary optimum is achieved, where all income is spent on good and
nothing on bad or neutral commodity.
2.2.4 Concave preferences

When preferences are concave, then tangency condition can be met like at point F on
budget line. But question is it optimum? The answer is no. The reason is yet higher
indifference curve is achievable and points like E1 in panel (1) and E2 in panel (2) are
boundary optimum in case of concave preferences.
Figure9

2.2.5 Cobb-Douglas preferences

Consider Cobb Douglas preferences where utility function takes the form U(x,y)=x cyd. Cobb
Douglas preferences exhibit well behaved preferences, and henceforth calculus technique
can be applied here.
xcyd)=c xc-1yd
(xcyd)=d xc yd-1
MRSxy=
Putting this value in budget constraint ,we get:
Px x* + Py x* =M
( )Px x* =M
x* = ( )
y*=
On rearranging demand function for x, we get:
In case of Cobb Douglas preferences, share of income (M) spent on good x (P xx) is equal to
the ( ). Hence, fraction of income spent on either good is fixed. The size of this fraction is
determined by the exponent (of quantity of that good) in Cobb Douglas function. In two
goods case, hence, it is better to assume that c+d=1. This assumption makes it clear that
income is spent on these goods with some weights given by respective exponents of units of
goods in utility function.

3. Demand
Demand function shows the relationship between price and quantity demanded. For a
normal good, there exists negative relationship between and price and quantity demanded.
In this section, we will analyze and derive demand curve in case of consumer’s different
preferences.
3.1 Well behaved preferences

When price of good x falls, budget lines pivots (around the y-axis) outward. Let us analyze
the path of optimal consumption bundles that is followed when price of good x changes.
In panel (i) of Figure 10, when price of good x falls from P 1 to P2 and then to P3 (while
holding price of y constant), budget line shifts from GH to GH1 to GH2. The optimal bundles
are marked E0, E1 and E2, respectively. With the fall in price of good x consumer has

enlarged budget set and hence, more of good x can be consumed.4 The quantity demanded
rises from x1 to x2 to x3 which corresponds to prices P1, P2 and P3, respectively. Connecting
all the optimal bundles lead to construction of price offer curve. This curve shows bundles
that would be demanded at different prices of good x. In panel (ii) of figure 10, we trace
down quantities and plot these quantities against their respective prices. We get demand
curve which is downward sloping i.e.
The price and quantity demanded of that good move in opposite direction cetirus peribus (Py, M and
consumer’s preferences are held constant).
3.2 Perfect Substitutes

Assume some price of x as PX* such that = . If price of good x falls below , <
which means slope of indifference curve is greater than slope of budget line and hence only
good x will be demanded. If price of good x rises above , only good y will be consumed
and zero quantity of good x is demanded. When price of good x is equal to any quantity
between zero and can be demanded.
Dashed lines in panel

(i) of Fig 11 are
indifference curves.
GH1 is the budget line
parallel to indifference
curves and hence have
slope . GH is
steeper than GH1 and

GH2 is flatter than
GH1.
Price offer curve has

two segments:-
i. Budget Line
GH1, since
4
Fall in price leads to two effects. First, purchasing original bundle leaves consumer with some extra income at
hand and second, fall in price of x makes it cheaper and sometimes consumer consume more of it in place of good
y. This will be discussed in chapters to come.

all bundles on this budget line are optimal i.e. when price of good x is .
ii. X-axis when price of good x has fallen below , only good x is demanded.
Figure 11
Again plotting down the quantities against respective prices yield demand curve in panel (2)
of figure 11. Above , zero units of good x are demanded; at any quantity between zero
and M/p*x can be demanded & if price falls further more of (only) good x is demanded.
3.3 Perfect complements

Perfect complements are consumed in combination and no utility is added if either of the
good’s consumption increases. So when price of x falls, consumers uses this extra real
income to consume both commodities in some fixed proportion. Hence, when price of good
x falls ,quantity demanded for x increases.
Figure 12

Price offer curve is the line joining all the kinks of indifference curves starting from origin.
For ‘a’ units of x with ‘b’ units of y example, we computed optimal bundle x*=
Differentiating x* with respect to px; we get:
= .a =
Or <0 (since M is always positive and rest all terms are squared)
3.4 Giffen goods

In the nineteenth century, it was found that it is likely that when price of a good falls ,less
of that good demanded. Such a good then is not normal and is known as giffen good .This
case is depicted in figure 13.

The demand curve then for giffen good is positively sloped. These type of goods are
exception to law of demand.
4 Engel curves
Engel curve shows the relationship between demand for a good and income of the
consumer. For a normal good, one can argue that there exists positive relationship between
the two. But there are goods whose demand falls when income of the consumer goes up.
Such goods are known as inferior goods.
4.1 Well behaved preferences: normal good case

When income of a consumer rises, budget line parallelly shifts outward, like shown in panel
(i) of figure 14. This changes optimal bundle from E0 to E1. Joining all optimal bundles one
can construct income offer curve. Repeating same exercise of plotting different quantity
demanded of good x for varying levels of income in panel (ii),we get Engel curve.

4.2 Inferior goods

Let consider that good x is inferior so when income rises then optimal bundle changes from
E0 to E1 in such a fashion that optimal quantity of x falls x1 to x2.In panel (ii) of figure 15
Engel’s curve is constructed joining (x1 ,M1) and (x2,M2). Engel curve for inferior good is
negatively sloped.
4.3 Perfect substitute

Good x which is perfect substitute to goody y is demanded only when < . So when
income increases, the entire addition to income is used up to consume good x. So when
income is M1 then x1= and then income increased to M2; x2 = .

x1 and x2 are boundary optimum shown as bundles E0 and E1 in panel (i) of figure 16. Panel
(ii) of figure 16 depicts Engels curve. Slope of Engel ´s curve is calculated as follows:
= = =Px
For x to change by 1 unit (exactly), income must change by Px.
4.4 Perfect complements

Demand pattern for good x which is used in some fixed proportion with y is depicted in
figure 17. Optimal quantity of x is given by
Differentiating above equation with respect to x*, we get:
Or = (slope of Engel curve)

Or >0
Figure 17 Figure 18
4.5 Cobb Douglas preferences

Optimal value of good x is linearly dependent on money income of the consumer, given by
the following equation:
x* = .
Again, differentiating this equation with respect to Δx* and upon rearranging, we get:
Px =
Assuming c+d =1 the reason for which was explained earlier, = ; which is slope of
Engel’s curve.

4.6 Homothetic preferences

A consumer’s preferences homothetic if marginal rate of substitution depends upon ratio of
units of two goods and not on total quantities of goods. If one look at MRS of Cobb Douglas,
perfect complements and perfect substitutes; their MRS are dependent on ratio y/x.
Indifference curve for homothetic preferences appear like copy of one pasted at different
levels. Slope of curves depend only ratio y/x, not on how far the curve is from the origin.
Figure19
This would mean Income offer curve is a straight line joining (x 1,y1),(2x1,2y1) ,(3x1,3y1) and
so on. Where (x1,y1) is optimal bundle when income is M and (2x1,2y1) is when income
doubles is optimal and likewise.
4.7 Quasi linear preferences

Quasi linear preferences are not homothetic. Assume U(x,y)=MRS= = . So MRS depends
on how much a consumer x and not on ratio (y/x).If consumer’s income is M1 and optimal
bundle is (x1,y1) and now if income increases his optimal bundle becomes (x1,y1+k) for any
constant k.

Figure 20
The example of such a good is salt. Even when income is added there is no increase in the
quantity of salt demanded. You spend addition to income on all goods but salt. Hence there
is ‘zero income effect’.
Summary
 For solution to utility maximization problem, it requires that indifference curve is
tangent to budget line or equivalently slope of the two are equal. When indifference
curves have kink, the kinked point should touch budget line for optimal solution.
 There are boundary optimums when a consumer consumes a) perfect substitutes, b)

a neutral good and, c) bad good or else if he has concave preferences.
 For a normal good, law of demand operates and quantity demanded moves in
opposite direction of –in response to- prince change. Demand curves are negatively
sloped in all cases but Giffen goods.
 For a normal good, change in quantity demanded is positively related to the change
in income of the consumer and hence, Engel curve is positively sloped in all cases
but inferior goods.

Exercises
Q1. a) If a consumer has a utility function U(x,y)= x1y4, what fraction of his income will
he spend on good y?
b) If prices are Px and Py and income, M; what will be consumer’s optimal choice
bundle?
Q2. Suppose that a consumer always consumes 2 spoons of sugar with 1 cup of tea and
their respective prices are Ps and Pt and consumer has m rupees to spend on sugar and tea.
How much will he demand?
Q3. Suppose a consumer’s utility function is U(x,y) = x2+y2.
a) Calculate his MRS.
b) Is his MRS diminishing?
c) Solve for optimal choice bundle if prices are Px and Py and consumer’s income is M.
Q4. Henry is currently consuming only Coke and Pizza. At his current consumption bundle
marginal utility of Coke is 10 and that of Pizza is 5. Each Coke costs Rs.2 and each Pizza
costs Rs.10. Is he maximizing his utility? Explain. If he is not, how can he increase his utility
while keeping his expenditure constant?
Q5. Assume good x is inferior. Draw income offer curve. Is it possible, even good y is
inferior? Explain.
Q6. Madhu views Pepsi and Coca-cola as perfect substitutes. The price of 750 ml bottle of
Pepsi is Rs. 10 and price of 750 ml bottle of Coca-cola is Rs.12. what does Madhu’s Engel
curve for Pepsi look alike? By how much her Budget should increase so that she can
consume one more unit of Pepsi?
Glossary
 Optimal choice: It is optimum when it is the best state of affairs and choice which
is optimum is called optimal choice.
 Price offer curve: The locus of all consumer equilibria when price changes is known
as price offer curve.
 Demand curve: Demand curve is curve showing the negative(for normal good)
relationship between price and quantity demanded by consumer.
 Giffen good: In case consumer violates law of demand, and for a good positive
relationship between price and quantity demanded by consumer is observed then
that good is called giffen good.
 Income offer curve: The locus of all consumer equilibria when income of consumer
changes is known as income offer curve.

 Engel’s curve: Engel’s curve is curve showing the positive (for normal good)
relationship between income and quantity demanded by consumer.
 Inferior good: : If for a good negative relationship between income and quantity
demanded by consumer is observed then that good is called inferior good.
 Homothetic Preferences: A consumer’s preferences homothetic if marginal rate of

substitution depends upon ratio of units of two goods and not on total quantities of
goods.
 Quasi-linear Preferences: Quasi-linear preferences are non-homothetic

preferences and have zero income effect.
References:
Hal R. Varian, Intermediate Microeconomics: A Modern Approach, W.W. Norton and

Company/Affiliated East-West Press (India), 8th edition, 2010.
C. Snyder and W. Nicholson, Fundamentals of Microeconomics, Cengage Learning (India), 2010.
www.wikipedia.org

Du Full Ba Economics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Du Full Ba Economics

Uploaded by

Copyright:

Available Formats

 Introductory Microeconomics

 Mathematical Methods for Economic-I

 Statistical Methods in Economics-I

 Statistical Methods in Economics-II

Lesson: Introduction to Microeconomics

Lesson Developer: Dipavali Debroy

College/Department: SGGSCC, University of Delhi

Institute of Lifelong learning, University of Delhi

Micro-Economics is the branch of Economics that studies economic issues minutely

But what is Economics?

Value Addition 1: Focus of the Section

Text for the section

Institute of Lifelong learning, University of Delhi

According to the renowned economist Alfred Marshall, it is the study of human

Any society faces three fundamental and interdependent economic problems:

Institute of Lifelong learning, University of Delhi

Institute of Lifelong learning, University of Delhi

3. Evolution of the subject

Institute of Lifelong learning, University of Delhi

4. The Methodology of Economics – Positive Economics and Normative

Economics can be subjected to another distinction, that between Positive Economics

Institute of Lifelong learning, University of Delhi

Several decades ago, textbooks in Economics often discussed this issue: Is

6. Scope of Economics - Related Subjects

Institute of Lifelong learning, University of Delhi

7. Models and Hypotheses

Institute of Lifelong learning, University of Delhi

8. Market and Equilibrium

8.1 Demand and Supply

Demand is desire backed by purchasing power. A buyer or consumer does not

Institute of Lifelong learning, University of Delhi

Price(P) Quantity Demanded (Qd)

Institute of Lifelong learning, University of Delhi

Demand `curve’. It slopes downwards from Left to Right, showing an Inverse or

Institute of Lifelong learning, University of Delhi

Institute of Lifelong learning, University of Delhi

Both Micro-economics and Macro–economics look beyond national boundaries.

9. Concept of ceteris paribus – General Equilibrium Partial Equilibrium

Institute of Lifelong learning, University of Delhi

10. Static and Dynamic Equilibrium

In a static equilibrium all quantities have unchanging values but in a dynamic

11. Short-Run and Long-Run Equilibrium

12. Nobel Prize in Economics

Value Addition 2: Test Yourself

Institute of Lifelong learning, University of Delhi

Now we suppose you should be able to answer the questions:

1. How would you define Micro-Economics?

1. Describe the evolution of the subject Economics.

Institute of Lifelong learning, University of Delhi

1. Economics, Paul A Samuelson

Was Adam Smith English, American, Scottish or French?

Institute of Lifelong learning, University of Delhi

Institute of Lifelong Learning, University of Delhi

Economics, as a subject, is a combination of arts and science, which contains the

Institute of Lifelong Learning, University of Delhi

3. The Basic Competitive Model

Institute of Lifelong Learning, University of Delhi

4. Incentives and Information

4.1 Property Rights

Property refers to ownership and control over a good or resource. It is a characteristic of

Institute of Lifelong Learning, University of Delhi

4.2 Prices, Property Rights and Profits

Institute of Lifelong Learning, University of Delhi

1.Rationing by Lotteries:–It is a system in which goods are allocated by a random