You are on page 1of 39

A Simple Note on Topics of Micro Theory: Part 1

Jianrong Tian∗

University of Hong Kong


September 18, 2021


E-mail address: jt2016@hku.hk.
Contents
1 Monotone Comparative Statics 3
1.1 Scalar Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Incorporating Uncertainty . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Multi-Dimensional Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Supermodularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Monotone Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Log-supermodularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 TP2 Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Convex Cone Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Log-concavity 15
2.1 Definition and Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 64% Rule: an application (In Class) . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Comparison of Experiments 20
3.1 Blackwell (1951) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.1 Experiments and Distribution of Posteriors . . . . . . . . . . . . . . 20
3.1.2 Values of Experiments for Decision Problems . . . . . . . . . . . . . 22
3.1.3 Sufficiency and Mean-Preserving Spread . . . . . . . . . . . . . . . 23
3.2 Lehmann (2012) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Persuasion and Costly Information Acquisition . . . . . . . . . . . . . . . . 27
3.3.1 Persuasion Without Cost . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.2 Incorporating costs of information . . . . . . . . . . . . . . . . . . . 28
3.4 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 Observational Learning 30
4.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 General Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.1 Main notations and assumptions . . . . . . . . . . . . . . . . . . . . 32

1
4.3 Bounded and unbounded signals . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Cascade sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5 Transition Law of pt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2
1 Monotone Comparative Statics
This part tries to provide you guys with the state of the art of methods of monotone
comparative statics. There are basically three ways of doing monotone comparative statics.
The first one is based on revealed preference argument. This approach relies crucially
on that the objective function is linear. By summing up two inequalities you get the
desired result. Examples include law of supply and compensated law of demand among
others. The second method is based on first order condition, implicit function theorem or
envelope theorem. This method relies on a lot of assumptions like concavity or smoothness
of objective functions. The last one is based on complementarity, which is our focus.
Given a poset (X, ≥), For each x and y in X, define the join of x and y, denoted by
x ∨ y, as the least element in X that is larger than both x and y. That is, x ∨ y is a point
in X such that x ∨ y ≥ x and x ∨ y ≥ y and that for any z with z ≥ x and z ≥ y, we
have z ≥ x ∨ y. Similarly, we define the meet of x and y, denoted by x ∧ y, as the largest
element in X that is smaller than both x and y. That is, x ∧ y is a point in X such that
x ∨ y ≤ x and x ∨ y ≤ y and that for any z with z ≤ x and z ≤ y, we have z ≤ x ∧ y. We
say that (X ≥) is a lattice if it is closed with respect to both the join and meet operations.
That is, for any two points x and x0 in X, both the meet and join of x and x0 exist in X.
Given a lattice (X, ≥), we say that a subset A of X is sublattice, if for any x and x0 in A,
both the meet and join of x and x0 in (X, ≥) are contained in A.

Remark 1. Given a lattice (X, ≥), a subspace is a subset of X endowed with the order
that is inherited from (X, ≥), with a generic one denoted by (A, ≥). It can happen that
(A, ≥) is a lattice but A is not a sublattice of (X, ≥). For instance, Let X = R2 and ≥ be
the canonical order, and A = {((2, 1), (1, 2), (0, 0), (3, 3))}. Then (A, ≥) is lattice. But A
is not a sublattice of (R2 , ≥).

Let (X, ≥) be a lattice, and A and B two subsets of X. We say that A is larger than
B in strong set order (SSO), denoted by A ≥SSO B, if for any x in A and y in B, we have
x ∨ y ∈ A and x ∧ y ∈ B.

Remark 2. Note that A ≥SSO A if and only if A is a sublattice.

Let f and g be two objective functions on X. We are interested to know when for each
subset A from a given class, we have

arg max f (x) ≥SSO arg max g(x). (1)


x∈A x∈A

3
A class of conditions that relate these two objective functions f and g are central to
the analysis.

Definition 1 (SC). Given two functions f and g on X, we say that f dominates g


in single-crossing (SC) order if for each x0 > x in X, we have g(x0 ) − g(x) ≥ 0 =⇒
f (x0 ) − f (x) ≥ 0 and g(x0 ) − g(x) > 0 =⇒ f (x0 ) − f (x) > 0.

A subset of X, say, A, is an interval, if for each x < x0 in A and each y in X with


x < y < x0 , we have y ∈ A. For each x < x0 in X, we denote by [x, x0 ] the interval
{y ∈ X : x ≤ y ≤ x0 }.

Definition 2 (IDO). Given two functions f and g on X, we say that f dominates g


in interval dominance order (IDO) if for each x0 > x in X with g(x0 ) − g(y) ≥ 0 for
each y ∈ [x, x0 ], we have f (x0 ) − f (x) ≥ 0, and if additionally g(x0 ) − g(x) > 0, then
f (x0 ) − f (x) > 0.

Note that the SCO is strictly stronger than the IDO. That is, if g(x0 ) − g(x) ≥ 0 but
g(x0 ) − g(y) < 0 for some y ∈ [x, x0 ], IDO order does not require f (x0 ) − f (x) ≥ 0. As I
shall show later, the SC condition is needed for monotone comparative statics like (1) in
case A in (1) can be any sub-lattice, while IDO is needed for (1) in case A is required to
be both sub-lattice and interval.

1.1 Scalar Case


We first discuss the case that X is totally ordered, that is, for each x and x0 in X, either
x ≥ x0 or x0 ≥ x. The real space or any subspace of the real space is an leading example
of such case. Then for any two points x and x0 in X, we have x ∨ x0 = max{x, x0 } and
x ∧ x0 = min{x, x0 }. Also note that any subset of X is a sublattice in case (X, ≥) is totally
ordered.

Proposition 1. For any subset A of X, we have arg max f (x) ≥SSO arg max g(x), if and
x∈A x∈A
only if f dominates g in SCO.

Now turn to the case of parameterised objective functions f (x, t) on X ×[0, 1], in which
for each t, f (x, t) as a function of x on X. We say that f (x, t) satisfies the single-crossing
property (SCP) if for each t0 > t, f (x, t0 ) as a function of x dominates f (x, t) as a function
of x in SCO.

4
Proposition 2. For any subset A of X, arg max f (x, t) increases in SSO with respect to
x∈A
t on [0, 1], if and only if f (x, t) satisfies the SCP.

Now we can see the reason that we name such condition as single-crossing.

Definition 3. A function h(t) : [0, 1] → R is SC, if t0 > t and h(t) ≥ 0 imply h(t0 ) ≥ 0
and t0 > t and h(t) > 0 imply h(t0 ) > 0.

A SC function crosses the x axis at most once and from below.

Observation 1. f (x, t) satisfies the SCP, if and only if for each x0 > x, the difference
f (x0 , t) − f (x, t) as a function of t on [0, 1] is SC.

The SCP is tight only when we allow any subset of X as a possible choice set. In many
cases, perhaps most cases, the choice sets are naturally intervals. For such cases, the SCP
is unnecessarily strong.

Proposition 3. For any interval of X, A, we have arg max f (x) ≥SSO arg max g(x), if
x∈A x∈A
and only if f dominates g in IDO.

Now we turn to a sufficient condition for IDO when X is an interval of the reals and
takes the form of [a, b) or [a, b].
Rx Rx
Proposition 4. Let g(x) = c + a g 0 (t)dt and f (x) = c0 + a f 0 (t)dt for x ∈ X. f
dominates g in IDO, if there exits some h(x) > 0 on X (weakly) increasing in x such that
f 0 ≥ h × g0.

The math behind the above result is the following one


Rb
Lemma 1. Let g be a function on [a, b] such that for each x ≥ a, x g(t)dt ≥ 0. Then for
Rb
any increasing and nonnegative function h(x) on [a, b], we have a g(t)h(t)dt ≥ 0.

The proof of the above result is based on that any nonnegative increasing function
lies in the closure of the convex hull of the indicator functions of upper intervals. The
single-crossing integral inequality below is actually a special case of Lemma 1.
R1 R1
Lemma 2. Let g(t) on [0, 1] be SC. Then 0 g(t)dt ≥ 0 implies that 0 g(t)h(t)dt ≥ 0 for
any nonnegative increasing function h(t) on [0, 1].
R1 R1
Note that if g is SC and 0 g(t)dt ≥ 0, then for each x, x g(t)dt ≥ 0, and so Lemma 2
is implied by Lemma 1. Yet, the following proof of Lemma 2 is elegant:

5
proof of Lemma 2: If g ≥ 0, then the result holds obviously. Now consider the case that
g(x) < 0 for some x. let x0 in [0, 1] be such that g(x) ≥ 0 for x > x0 and g(x) ≤ 0 for
x < x0 . Such x0 exists, due to the SCP of g. Then
Z 1 Z x0 Z 1
g(t)h(t)dt = g(t)h(t)dt + g(t)h(t)dt.
0 0 x0

Note that Z x0 Z x0
0
g(t)h(t)dt ≥ h(x ) g(t)dt.
0 0

and Z 1 Z 1
0
g(t)h(t)dt ≥ h(x ) g(t)dt.
x0 x0

Thus, Z 1 Z 1
0
g(t)h(t)dt ≥ h(x ) g(t)dt.
0 0

The desired result follows since h > 0

Finally, given f (x, t) on X × [0, 1], we say that f (x, t) satisfies the IDO condition, if for
each t0 > t, f (x, t0 ) dominates f (x, t) in IDO. For each interval A, arg max f (x, t) increases
x∈A
in SSO with respect to t, if and only if f (x, t) satisfies the IDO condition.

1.1.1 Incorporating Uncertainty

Now let’s take t as states. The conditions on f (x, t) discussed previously guarantee that
the optimal choice increases with states when you know the state. Now suppose that you
are uncertain of the states and have a belief, a density function π(t) > 0 on the set of
possible states [0, 1]. Then the expected utility of each choice x among X with belief π is
R1
given by 0 f (x, t)π(t)dt. Let π 0 and π be two different beliefs. We first ask, under which
conditions, for each A ⊆ X, we have
Z 1 Z 1
0
arg max f (x, t)π (t)dt ≥SSO arg max f (x, t)π(t)dt? (2)
x∈A 0 x∈A 0

R1
Let U (x, π) = 0 f (x, t)π(t)dt. By Proposition 1, the necessary and sufficient condition
for the above monotone comparative statics is that U (x, π 0 ) dominates U (x, π) in SCO.

Proposition 5. Suppose that f (x, t) satisfies the SCP. Also, suppose that π 0 (t)/π(t) > 0
is increasing in t. Then U (x, π 0 ) dominates U (x, π) in SCO.

6
The above result says that SCO will be inherited when belief increases in monotone
likelihood ratio (MLR). Or we may consider a family of beliefs π(t, s) > 0 parameterised
by s ∈ S ⊆ R. That is, for each s, π(t, s) as a function of t is a density function on
the states [0, 1]. You can think of s as signal realisation, and π(t, s) the posterior belief
conditional on s.

Proposition 6. Suppose that f (x, t) on X × [0, 1] satisfies the SCP. Also, assume that
π(t, s0 )/π(t, s) is increasing in t for each s0 > s.Then U (x, π(·, s)) on X × S satisfies SCP.

The condition that π(t, s0 )/π(t, s) is increasing in t for s0 > s simply means that the
posterior belief associated with a larger signal realisation is a MLR upward shift of the
one associated with a smaller signal realisation.
Now we turn to the question when (2) holds whenever A is an interval of X. By
Proposition 3, the necessary and sufficient condition is that U (x, π 0 ) dominates U (x, π)
in IDO. Say that f (x, t) is regular if for each x < x0 in X, and each t ∈ [0, 1], the set
arg max0 f (y, t) is non-empty.
y∈[x,x ]

Proposition 7. Suppose that f (x, t) on X×[0, 1] satisfies the IDO condition and is regular.
Also, assume that π(t, s0 )/π(t, s) is increasing in t for each s0 > s.Then U (x, π(·, s)) on
X × S satisfies the IDO condition.

1.2 Multi-Dimensional Cases


Now we turn to the general case that the lattice (X, ≥) might not be totally ordered,
like multi-dimensional Euclidean space. To guarantee monotone comparative statics for
such cases, besides the SCO or IDO that relates the two objective functions, we also need
a novel component that is related to how function value changes when choice variable
changes for each fixed objective function. The latter is satisfied automatically when X is
totally ordered.

Definition 4. A function f (x) over X is quasi-supermodular if for each x and x0 , we have

f (x) ≥ (>)f (x ∧ x0 ) =⇒ f (x ∨ x0 ) ≥ (>)f (x0 ).

Proposition 8. Suppose that f (x) is quasi-supermodular. Then for any sub-lattice A,


arg max f (x) is a sub-lattice
x∈A

7
Proposition 9. Let f (x, t) be over X × [0, 1]. For each A0 ≥SSO A, and each t0 ≥ t,
we have arg max0 f (x, t0 ) ≥SSO arg max f (x, t) if and only if (1)for each t, f (x, t) is quasi-
x∈A x∈A
supermodular in x over X and (2) f (x, t) satisfies the SCP.
Note that if we take t0 = t, we have arg max0 f (x, t) ≥SSO arg max f (x, t), whenever
x∈A x∈A
A0 ≥SSO A, by which the quasi-supermoduliarty condition is necessitated. As an easy
corollary, we have monotone comparative statics when the choice set is a sublattice.
Corollary 1. Suppose that 1)for each t, f (x, t) is quasi-supermodular in x over X and
(2) f (x, t) satisfies the SCP. Then for each sub-lattice A, and each t0 ≥ t, we have
arg max f (x, t0 ) ≥SSO arg max f (x, t)
x∈A x∈A

Corollary 1 has a similar counter-part for the case that all possible choice sets are both
sub-lattice and intervals.
Corollary 2. Suppose that 1)for each t, f (x, t) is quasi-supermodular in x over X and
(2) f (x, t) satisfies the IDO. Then for each A which is both a sub-lattice and an interval,
and each t0 ≥ t, we have arg max f (x, t0 ) ≥SSO arg max f (x, t).
x∈A x∈A

1.2.1 Supermodularity

The property of quasi-supermodularity may not be easy to check. In some perhaps many
cases, we turn to a stronger property: Supermodularity
Definition 5. Let (X, ≥) be a lattice. A function f (x) over X is supermodular if for each
x and x0 , we have
f (x ∨ x0 ) + f (x ∧ x0 ) ≥ f (x) + f (x0 ).
Supermodularity implies quasi-supermodularity. When (X, ≥) is the product of some
totally ordered sets, supermodularity reduces to pairwise supermodularity:
Proposition 10. Let (X, ≥) be the product of N totally ordered spaces (X1 , ≥1 ), · · · , (XN , ≥N
). That is, X = X1 × · · · × XN , and x0 ≥ x if and only if x0n ≥n xn for each n = 1, · · · , N .
Then f (x) over (X, ≥) is supermodular if and only if for each i 6= j, each x̄−ij ∈ X−ij ,
f (xi , xj , x̄−ij ) is supermodular in (xi , xj ) over Xi × Xj .
Proposition 11. Let X = [a, b] × [c, d] ⊆ R2 with a < b and c < d, and ≥ be the canonical
one. Also, assume that f (x1 , x2 ) over X is twice differentiable. Then f is supermodular if
and only if
∂ 2f
≥0
∂x1 ∂x2

8
.

Finally, note that the property of supermodularity is preserved by integral. That is,
R1
if f (x, t) is supermodular in x for each t. Then 0 f (x, t)π(t)dt is also super-modular in
x. That is, if we can show that f (x, t) is supermodular in x for each t, then with the
SCP or the IDO on f (x, t) depending on the the class of choice sets allowed, monotone
comparative statics hold when uncertainty is incorporated and beliefs are ordered in MLR.

1.3 Monotone Selection


Given (X, ≥), we discussed the conditions on f (x, t) and the choice set A such that
arg max f (x, t) increases in SSO with respect to t so far. A monotone selection, denoted
x∈A
by a∗ (t), satisfies that a∗ (t) ∈ arg max f (x, t) for each t and that a∗ (t) increases in t. The
x∈A
existence of monotone selection is important for some analysis. Suppose that for each t,
arg max f (x, t) is non-empty and has the greatest element, denoted by ā(t). Then ā(t) is a
x∈A
monotone selection when arg max f (x, t) increases in SSO w.r.t. t. The same also applies
x∈A
to the selection of the least element. So now the problem is, when such extreme selections
exist.

Lemma 3. Let A ⊂ RN be non-empty and compact. Let ≥ be the canonical order on RN .


Suppose that (A, ≥) is a lattice. Then both the greatest element and the least element of A
exist.

Proof. Since A is compact, A ⊆ [a1 , b1 ] × · · · × [aN , bN ] with an ≤ bn for each n. Let


a = (a1 , · · · , aN ) and b = (b1 , · · · , bN ). Consider the function d(x, b) = N
P
n=1 |bn − xn |
on A. d is a continuous function on A. Since A is compact, arg min d(x, b) is nonempty,
x∈A
denoted by x̄. For any point x in A, the join of x and x̄ exists in A, since A is a lattice.
Then we must have x ∨ x̄ = x̄ since otherwise, x̄ cannot achieve the minimum of d over
A, by which we have x̄ ≥ x. The same argument also applies to the case of the least
element.

Proposition 12. Let X ⊆ RN , and ≥ be the canonical order. Assume that (X, ≥) is
a lattice. Assume that f (x) is continuous and quasi-supermodular on X. Then for each
non-empty compact sub-lattice A. The set arg max f (x) has both the greatest element and
x∈A
the least element.

9
Proof. Since f is continuous and A is non-empty and compact, the set arg max f (x) is
x∈A
also non-empty and compact. Since A is a sub-lattice and f (x) is quasi-supermodular,
arg max f (x) is also a sub-lattice. The desired result then follows from Lemma 3.
x∈A

1.4 Log-supermodularity
Suppose you have n random variables jointly distributed in Rn according to a distribution
function f . It is quite common that these random variables are positively correlated. We
need a formal notion to capture such positive correlation that is useful for analysis. We
say that the n random variables with joint probability density f are affiliated if for each
x and x0 , we have
f (x ∨ x0 )f (x ∧ x0 ) ≥ f (x)f (x0 ).

Remark 3. Consider the case that f (x1 , x2 ) > 0. Then the above condition implies that
for each x01 > x1 , the marginal posterior density function on X2 conditional on x01 is a
MLR upward shift of the one conditional on x1 . That is, a larger x1 makes larger x2 more
likely.

Definition 6. Let (X, ≥) be a lattice. A function f (x) over X is log-supermodular if for


each x and x0 , we have
f (x ∨ x0 )f (x ∧ x0 ) ≥ f (x)f (x0 ).

The most important property of log-supermodularity is that it is preserved under par-


tial integral.

Proposition 13. Let X ×S be such that both X and S are products of some real intervals.
R
Let g(x, s) ≥ 0 be log-supermodular over X × S. Then the function G(x) = S g(x, s)ds is
log-supermodular on X.

The above proposition is a consequence of the following important integral inequality.

Proposition 14. Let X be the product of some real intervals. Let h1 , h2 , h3 , h4 be four
non-negative functions defined over X. Suppose that for each x and x0 in X, we have

h1 (x)h2 (x0 ) ≤ h3 (x ∨ x0 )h4 (x ∧ x0 ),

then we have Z Z Z Z
h1 (x)dx h2 (x)dx ≤ h3 (x)dx h4 (x)dx
X X X X

10
Proof. See Karlin and Rinott (1980).

Proof of Proposition 13: We need to show that for each x and x0 , we have
Z Z Z Z
0 0
g(x, s)ds g(x , s)ds ≤ g(x ∨ x , s)ds g(x ∧ x0 , s)ds.
S S S S

Let h1 (s) = g(x, s), h2 (s) = g(x0 , s), h3 (s) = g(x ∨ x0 , s) and h4 (s) = g(x ∧ x0 , s). The
desired result follows.

Finally, we note how the above result can be applied to derive monotone comparative
statics, though it is widely used in some other contexts. u(x, s) > 0 on X × S be a
state-dependent utility function in which both the choice set X and the state space S
are products of some real intervals. Let f (s, t) on S × [0, 1] be a family of parameterised
density function on S.

Proposition 15. Suppose that u(x, s) > 0 is log-supermodular on X × S and f (s, t) is


log-supermodular on S × [0, 1]. Then for each t0 ≥ t and each sub-lattice A, we have
Z Z
0
arg max u(x, s)f (s, t )ds ≥SSO arg max u(x, s)f (s, t)ds.
x∈A S x∈A S

1.4.1 TP2 Order

Now we give a generalisation of the MLR order between two density functions on some
subinterval of the reals. For two density functions f and g over Rn , we say that f is higher
than g in TP2 order, if for each x and x0 in Rn , we have

f (x)g(x0 ) ≤ f (x ∨ x0 )g(x ∧ x0 ).

Check that when n = 1 and f and g have the same support, then TP2 order reduces to
the MLR upward shift. That is, MLR upwards shift is a special case of TP2 order.

Proposition 16. Let f and g be two probability density functions over Rn , and h an
increasing function over Rn . Assume that h is integrable under both f and g. If f higher
R R
than g in TP2 order , then Rn f (x)h(x)dx ≥ Rn g(x)h(x)dx.

Note that for the case n = 1, it is something that we already known. Because for the
case of the reals, MLR upwards shift implies FOSD. Finally, we state s result on monotone
comparative statics with multi-dimensional state space and supermodular primitives.

11
Proposition 17. Suppose that u(x, s) is supermodular on X × S and f (s, t) is log-
supermodular on S × [0, 1]. Then for each t0 ≥ t and each sub-lattice A, we have
Z Z
0
arg max u(x, s)f (s, t )ds ≥SSO arg max u(x, s)f (s, t)ds.
x∈A S x∈A S

1.5 Convex Cone Argument


In the proof of FOSD, we first show the condition needed on two distributions F and
G for a specific class of monotone increasing functions, i.e., all the indicator functions of
upper intervals, denoted by I, so that the integral under F is larger than that under G
for each I ∈ I. Then we can show that for all increasing functions, the integral under
F is larger than that under G. The main idea behind a convex cone argument. Let
Convexcone(I) = {f : (a, ∞) → R, f = ah + bg, for some a ≥ 0 and b ≥ 0, h ∈ I, g ∈ I}
be the convex cone of I. Apparently,

I ⊂ Convexcone(I).
R R
Apparently, I(x)dF ≥ I(x)dG holds for each I ∈ I if and only if the same inequality
holds for each function in convexcone(I). Now denote the set of all bounded nonnegative
increasing functions over (a, ∞) by U. We have

I ⊂ Convexcone(I) ⊂ U.

Note that any nonnegative bounded increasing function is a limiting point of convexcone(I)
w.r.t. point-wise convergence or even super norm. Then by dominated convergence the-
R R
orem, u(x)dF ≥ u(x)dG holds for each u ∈ convexcone(I) if and only if the same
inequality holds for each function in U. Now let U 0 be the set of all the bounded increasing
functions. We have
I ⊂ Convexcone(I) ⊂ U ⊂ U 0 .

Note that for a bounded increasing function, if we add a large enough positive scalar to
it, it becomes a bounded nonnegative increasing function, and adding such a scalar does
R R
not change the comparison of integral. Thus, u(x)dF ≥ u(x)dG holds for each u ∈ U
if and only if the same inequality holds for each function in U 0 . Now let U 00 be the set of

12
increasing functions over (a, ∞) that are integrable under both F and G. We have

I ⊂ Convexcone(I) ⊂ U ⊂ U 0 ⊂ U 00 .
R R
Using the trick of truncation, we can show that if u(x)dF ≥ u(x)dG holds for each
u ∈ U 0 , then the same inequality holds for each function in U 00 . So, u(x)dF ≥ u(x)dG
R R

holds for each u ∈ I if and only if the same inequality holds for each increasing integrable
function. This special small class of increasing functions I are sometimes called “test
functions” of the family of increasing functions, in the sense that the integral under F is
larger than that under G for any increasing function if and only if the same inequality
holds for each function in I.

1.6 Reference
Milgrom and Shannon (1994) is the paper that everyone should read. Quah and Strulovici
(2009) proposes IDO. Athey (2002) discusses log-supermodularity, which is based on Kar-
lin and Rinott (1980). Tian (2021) contains a multi-dimensitonal application of IDO to
interval division problems.

1.7 Exercises
Exercise 1. Let φ(z) : R → R. Under what kind of conditions of φ(z), the function
φ(x − y) over R2 is supermodular?

Exercise 2. Let X be a sublattice of (RM , ≥), and Y a sublattice of (RN , ≥) and f (x, y) a
supermodular function defined over X × Y . Prove that V (x) = sup f (x, y) is supermodular
y∈Y
over X.

Exercise 3. Let A = [0, 1] a the set of actions, and Θ = [0, 1] the set of states. Let u(θ, a)
over [0, 1]2 be the state-dependent utility of actions. Assume that u(θ, a) is single-crossing,
that is, for each a0 > a, u(θ, a0 )−u(θ, a) is SC in θ. Let X 2 = {(x1 , x2 ) ∈ [0, 1]2 : x1 ≤ x2 }.
Rx
Prove that v(x1 , x2 ) = sup x12 u(θ, a)dθ is supermodular over X 2 .
a∈[0,1]

Exercise 4. A consumer has utility function u(x1 , x2 ) over R2+ for goods 1 and goods 2.
Assume that u(x1 , x2 ) is supermodular and strictly concave. Also, assume that u(x1 , x2 )
strictly increases in both argument. Prove that both goods are normal goods. That is, for

13
i = 1, 2, let xi (p1 , p2 , m) be the Marshall demand function for goods i in which m ≥ 0 is
the income, and p1 > 0,p2 > 0 the prices. Prove that each xi (weakly) increases in m.

Exercise 5. Using the single-crossing inequality, to prove the following important inequal-
ity. Let h ≥ 0 be increasing and g be decreasing, and f some probability density function
over the real interval I. Then
Z Z Z
g(x)h(x)f (x)dx ≤ g(x)f (x)dx h(x)f (x)dx.
I I I

Exercise 6. Consider the following standard profit maximisation problem.

max f (k, l) − rk − wl.


k≥0,l≥0

w > 0 is the wage and r > 0 the interest rate. f (k, l) is the production function, in which k
is capital and l the labor. Assume that f (k, l) is supermodular in (k, l) and strictly concave
in k for each l. let (k, l) be optimal for (r, w) and (k 0 , l0 ) be optimal for (r, w0 ). Assume
that w0 > w. Prove that (k 0 , l0 ) ≤ (k, l). (Donot use more condition than assumed here!!
no differentiability!)

Exercise 7. Consider the portfolio choice problem, you have 1 dollar, and decides how to
split it between risky-less asset of which the gross return is fixed at r0 , and risky asset of
which the return is r, which is a random draw according to some positive density function
f supported on [a, b]. Assume that a < r0 < b. your Bernoulli utility over wealth is u(w).
Assume that u is twice differentiable with u0 > 0 and u00 < 0. The optimal fraction of your
wealth on risky asset solves the following problem
Z b
max u((1 − x)r0 + xr)f (r)dr.
x∈[0,1] a

1. Now let f shifts upwards to f 0 in MLR, Prove that you will put more money on risky
asset. (Hint: Use the SC inequality on FOC)

2. Suppose that one guy is more risk averse than you: his Bernoulli utility function v is
a strictly concave transformation of yours, that is, v(w) = g(u(w)), in which g 0 > 0,
g 00 < 0, Prove that he will invest less on the risky asset compared with you. (Hint:
Use the SC inequality on FOC)

Exercise 8. Prove Proposition 16. (Hint: using Proposition 14).

14
Exercise 9. Let f (θ, s) > 0 be some joint density function over Θ × S in which both Θ
and S are some nontrivial subintervals of the reals. Let f (θ|s) be the posterior density
function over θ conditional on each s. Prove that if f (θ, s) is log-supermodular, then so is
it with f (θ|s).

Exercise 10. Consider a supermodular game. There are N players. The choice set for
each player n is denoted by An , which is a nonempty compact subset of the reals. For each
N
Q
n, let Un : A := An → R be the utility function of player n. Assume that each Un is
n=1
continuous and super-modular over A. We consider the set of rationalizable actions for
each player. Let A0n = An for each n. Given that Akn for each n has been defined, define
Ak+1
n for each n as below:
Y
Ak+1
n = {an ∈ An : an is a best response of player n to some a−n in Akn0 }.
n0 6=n

Note that by the maximum theorem, each Akn is non-empty and compact. Furthermore,
Ak+1
n ⊆ Akn for each n and k. This implies the following set A∗n is nonempty and compact.

A∗n = ∩∞ k
k=0 An ,

N
A∗n ,
Q
which is the set of rationalizable actions for player n. Let a be the least element in
n=1
N
A∗n . Prove that both a and ā are Nash equilibria. (This
Q
and ā the largest element in
n=1
actually proves the existence of Nash equilibrium for supermodular games without resorting
to the well-known fixed point theorems.) (Hint: first show that an is in A∗n if and only if it
is a best response to some a−n in n0 6=n A∗n0 . And then use monotone comparative statics.
Q

) (A math result you need in your proof is that for any sequence of closed subsets of the
Rn . say, {Bn }, if each finitely many of them has nonempty intersection, then ∩∞ n=1 Bn is
non-empty.)

2 Log-concavity
2.1 Definition and Main Results
For analysis involving uncertainty, in many cases you need some conditions on distribution
functions or density functions to ensure some regularity. Log-concavity is a prevailing

15
assumption in analysis.
Definition 7. A nonnegative function f over RN is log-concave if for each 0 < λ < 1,
each x and x0 in RN , we have

f (λx + (1 − λ)x0 ) ≥ f (x)λ f (x0 )1−λ .

About examples and applications of log-concavity, please refer to Bergstrom (1989).


You guys need to know the following two important results related to log-concavity. All
functions involved are assumed to be Lebesgue integrable.
R
Theorem 1. Let f (x, y) ≥ 0 over RM × RN be log-concave. Then G(x) = RN f (x, y)dy
is log-concave in x.
R
Corollary 3. Let f ≥ 0 over RN be log-concave. Then F (x) = t≤x f (t)dt, G(x) =
R R R
t<x
f (t)dt and H(x) = t≥x
f (t)dt and E(x) = t>x
f (t)dt are all log-concave.
Proof. I only prove the case of F . Let γ(x, t) = I(∞,x] (t) over RN × RN be the indi-
cator function of the set (∞, x]. It is easy to check that γ(x, t) is log-concave in (x, t).
R
Then f (t)γ(x, t) is log-concave in (x, t), and so F (x) = RN f (t)γ(x, t)dt is log-concave by
Theorem 1.

Proof of Theorem 1
Theorem 1 is a direct consequence of the Prekopa-Leindler inequality:
Proposition 18 (Prekopa-Leindler inequality). Let h, e and g be three non-negative func-
tions defined over RN . If for some 0 < λ < 1, we have

h(λx + (1 − λ)x0 ) ≥ e(x)λ g(x0 )1−λ

for each x and x0 in RN , then we have


Z Z Z
λ
h(x)dx ≥ ( e(x)dx) ( g(x)dx)1−λ
RN RN RN

Proof of Theorem 1. To prove that G(x) is log-concave, we need to show that for each x
and x0 in RM and each λ ∈ (0, 1), we have
Z Z Z
0
f (λx + (1 − λ)x , y)dy ≥ ( λ
f (x, y)dy) ( f (x0 , y)dy)1−λ .
RN RN RN

16
Let h(y) = f (λx + (1 − λ)x0 , y), e(y) = f (x, y), and g(y) = f (x0 , y). Since f (x, y) is
log-concave in (x, y), for each y and y 0 in RN , we have

h(λy + (1 − λ)y 0 ) ≥ e(y)λ g(y 0 )1−λ .

The desired result then follows from Prekopa-Leindler inequality.

The Prekopa-Leindler inequality (Proposition 18) follows from the Brunn-Minkowski


inequality and the layer cake representation. We first prove it for N = 1 and then use
induction to extend it to general cases.

Proof of Proposition 18 for N = 1. Let µ be the Lebesgue measure in R. For each function
h(x) over R and each real number, t, let Dh (t) = {x ∈ R : h(x) > t} be the strict upper
contour set. By layer-cake representation, we have
Z Z +∞
h(x)dx = µ(Dh (t))dt
R 0
Z Z +∞
e(x)dx = µ(De (t))dt
R 0
Z Z +∞
g(x)dx = µ(Dg (t))dt
R 0

Next, by the condition for Prekopa-Leindler inequality, for each t, we have

λDe (t) + (1 − λ)Dg (t) ⊆ Dh (t)

(λA + (1 − λ)B , {z ∈ R : ∃x ∈ A, y ∈ B such that z = λx + (1 − λ)y}.) By Brunn-


Minkowski inequality, we have

(µ(Dh (t))) ≥ λ(µ(De (t))) + (1 − λ)(µ(Dg (t))),

when both µ(De (t)) > 0 and µ(Dg (t)) > 0 which are the only relevant cases if we use some
trick as the extension paper I uploaded. So we get
Z Z Z
h(x)dx ≥ λ e(x)dx + (1 − λ) g(x)dx.
R R R

R R R R
The desired result follows since λ R
e(x)dx+(1−λ) R
g(x)dx ≥ ( R
e(x)dx)λ ( R
g(x)dx)1−λ ,

17
that is, the arithmetic mean is larger than the geometric mean.

We then use induction and Fibini’s theorem to extend the scalar case to multidimen-
sional cases.

Theorem 2. Let F be a log-concave cumulative distribution function over R with x∗ as


mean. Then
F (x∗ ) ≥ 1/e.

If the complementary cumulative distribution function 1 − F (x) is log-concave with x∗ as


mean, then
1 − F (x∗ ) ≥ 1/e.

Proof. The idea is to first prove that the result for log-linear cumulative distribution
function, and then extend it to general case using linear approximation. Let F̄ (x) = eαx+β
with α > 0 be some log-linear CDF with (−∞, b] as its support, that is, F̄ (b) = 1. Let x̄
be the mean value of F̄ . We have x̄ = b − 1/α, and thus

F̄ (x̄) = 1/e.

(That is, the lower bound is achieved when F is log-linear.) Now consider the general
case. We can take linear approximation of ln F at x∗ . That is, there exists some log-linear
cumulative distribution function F̄ (x) = eαx+β , such that αx + β is tangent to ln F at x∗ .
Also F̄ ≥ F . Thus we have x̄ ≤ x∗ , and so F (x∗ ) = F̄ (x∗ ) ≥ F̄ (x̄) = 1/e.

2.2 64% Rule: an application (In Class)

2.3 Reference
Prékopa (1981) contains a discussion and application of log-concavity, that is not economics-
oriented. Caplin and Nalebuff (1991) discuss the 64% rule. Bagnoli and Bergstrom (2005)
contain a discussion of the applications of log-concavity in economics. Smith, Sørensen and
Tian (2021) develop a log-concave condition on signal structures with binary state that
is tight for monotone belief updating with multiple actions in traditional herding model.
Di Tillio, Ottaviani and Sørensen (2021) use log-concavity and log-convexity.

18
2.4 Exercises
Exercise 11. Suppose that state θ is drawn from R according to some positive density
function f (θ). You observes s with s = θ + e, where e is a random draw from R according
to some positive density function φ(x). Assume that θ is independent of e. Assume also
that φ is log-concave. Prove that for each s0 > s, the posterior density function over θ
conditional on s0 is a MLR upwards shift of the one conditional on s.

Exercise 12. Consider the cheap-talk game. Let U R (θ, a) be Receiver’s utility function,
in which θ is supported on [0,1]accodingtosomepositivedensityf unctionf(θ), and a ∈ [0, 1]
the set of actions that Receiver can choose from. Assume that U R is twice continuously
R R
differentiable with U12 > 0 and U22 < 0 and that for each θ, there exists some a such
R
that U2 (θ, a) = 0. Conditional on each message sent for the interval [θ1 , θ2 ] (θ1 < θ2 ) by
Sender, Receiver will respond optimally by choosing action a∗ (θ1 , θ2 ) such that
Z θ2

a (θ1 , θ2 ) = arg max U R (θ, a)f (θ)dθ
a∈[0,1] θ1

Now let U R (θ, a) = U (θ − a), in which U 00 < 0, i.e, the location-form utility. Suppose that
f (θ) is log-concave. Prove that for any δ > 0,

a∗ (θ1 + δ, θ2 + δ) − a∗ (θ1 , θ2 ) ≤ δ.

(Hint: Use FOC )

Exercise 13. There are two possible states, θ ∈ {H, L}. Tom and Jerry share a common
prior belief p ∈ (0, 1), which is the chance of state H. Tom observes some private signal s ∈
R, and will update his belief, denoted by q(p, s), that is, the chance of state H, conditional
on signal realisation s. Denote the density function over signals conditional on state θ by
fθ (s), in which f H (s) = es f (s), and f L (s) = f (s). Then

p × f H (s) pes
q(p, s) = = .
p × f H (s) + (1 − p)f L (s) p × es + (1 − p)

Tom then will optimally choose an action from A = {a1 , · · · , aN +1 } based on his updated
belief. W.L.O.G, suppose that Tom will choose an if and only if his posterior belief falls
into (qn−1 , qn ] in which q0 = 0 < q1 < · · · < qN +1 = 1.
Jerry cannot observe the signal realisation, but he can observe Tom’s choice. Assume

19
that Jerry knows both the signal structure, and also Tom’s utility function and thus Tom’s
choice rule. Jerry will based on his observation of Tom’s choice of each an updates his
own belief in the following way: Let S(p, n) = {s : q(p, s) ∈ (qn−1 , qn ]} be the set of
private signals conditional on which Tom’s posterior belief falls into (qn−1 , qn ]. For each
1 ≤ n ≤ N , let sn (p) be the signal upon which Tom’s the posterior belief will be qn , that is,
q(p, sn (p)) = qn . Let s0 (p) = −∞ and sN +1 (p) = +∞. The chance of S(p, n) conditional
on state θ = H, L, denoted by φθ (p, n) is then given by φθ (p, n) = F θ (sn (p)) − F θ (sn−1 (p)).
The unconditional chance of action an is then given by φ(p, n) = pφH (p, n)+(1−p)φL (p, n).
Jerry’s updated belief conditional on each action an chosen by Tom with positive chance,
denoted by π(p, n), is given by

pφH (p, n)
π(p, n) = .
φ(p, n)

Belief updating monotonicity refers to that for two different prior belief p < p0 . and some
action an , if both φ(p, n) > 0 and φ(p0 , n) > 0, then π(p, n) ≤ π(p0 , n). First, why belief
updating monotonicity cannot hold generally? Now suppose that f is log-concave. Does
Belief updating monotonicity hold now? Prove or disprove it.

3 Comparison of Experiments
3.1 Blackwell (1951)
3.1.1 Experiments and Distribution of Posteriors

Let Θ = {1, · · · , N } be a set of finitely many states. A (Blackwell) experiment is a


conditional distribution tuple E = (µ1 , · · · , µN ), where each µn is the probability measure
over the signal space S conditional on state n. Let p = (p1 , · · · , pN ) be a prior belief over
states where each pn > 0 is the chance of state n. (If pn = 0 for some n, you can simply
delete that state from your analysis.) Let µ = N
P
n=1 pn µn be the unconditional probability
distribution over S. The posterior belief conditional on each s, denoted by q(s), is given
by

dµ1 dµN
q̃(s) = (p1 (s), · · · , pN (s)), (Bayesian Rule) (3)
dµ dµ

20
in which dµ n

is the Radon-Nykodym derivative of µn w.r.t. µ, and each pn dµ n

(s) is the
posterior chance of state n conditional on s.
P
Let ∆ = {(q1 , · · · , qN ) : n qn = 1, qn ≥ 0} be the set of all possible beliefs. The
distribution of S conditional on each state n, i.e., µn , induces a distribution over ∆ via the
P
posterior mapping q̃(s), which we denote by mn . Similarly, denote by m = pn mn the
n
unconditional distribution of the posteriors. Once you know the unconditional distribution
of the posteriors m, you know both the prior belief and each mn :

Proposition 19. The following statements hold


R
1. ∆
qdm = p.

2. For each n, dmn = q/pn dm.


R R
Proof. For the first statement, for each n, we have ∆ qn dm = S pn (dµn /dµ)(s)dµ(s) =
p dµn = pn . For the second statement, we show that for each n, pn dm
R
S n dm
n
(q) = qn . By
the definition of R-N derivative, we need to show that for each measurable subset B of
∆, we have pn × mn (B) = B qn dm. Let q̃ −1 (B) = {s ∈ S : q̃(s) ∈ B} be the inverse
R

image of B under q̃(s). We have B qn dm = q̃−1 (B) pn dµ


R R n
R

(s)dµ(s) = p dµn =
q̃ −1 (B) n
−1
pn × µn (q̃ (B)) = pn × mn (B).

Now we have another experiment (m1 , · · · , mN ) with ∆ as the signal space, which is
referred to as standard experiments in Blackwell (1951). Let q̂ : ∆ → ∆ be the corre-
sponding posterior mapping with p as the prior belief. The above proposition immediately
implies the following observation.

Corollary 4 (No Introspection). q̂(q) = q.

Now we ask what kind of distribution over posteriors can be induced by some experi-
ment with prior belief p. Proposition 1 says that a necessary condition is that the mean
of the distribution over posteriors should equal to the prior belief p. We now show that
this condition is also sufficient.

Proposition 20. Let m be some distribution over ∆. m is the distribution over posteriors
induced by some experiment with prior belief p iff the mean of m equals to p.

Proof. (Sufficiency). Let the mean of m be p. We now construct an experiment that


induces m as the distribution over posteriors when the prior belief is p. We use ∆ as signal

21
space, and construct distribution over ∆ conditional on each state n, denoted by mn . For
each state n, let dmn = qn /pn dm, that is, mn is such that for each B ⊆ ∆,
Z
mn (B) = qn /pn m(dq).
B
R
mn defined in this way is a positive measure. Furthermore, mn (∆) = ∆ qn /pn m(dq) =
1
R
pn ∆ n
q m(dq) = 1. Thus mn is a probability measure over ∆. Also, note that yy
Bayesian rule (3), the posterior mapping q̃ : ∆ → ∆ is the identity mapping: q̃(q) =
(p1 dm
dm
1
, · · · , pN dm
dm
N
) = q. Finally, we show that the unconditional distribution over the
signal space ∆ induced by (m1 , · · · , mN ) with prior belief p is exactly m: For each mea-
surable subset B of ∆,
X X Z
pn mn (B) = pn qn /pn m(dq)
n n B
XZ
= qn m(dq)
B
Zn X
= ( qn )m(dq)
B n
Z
= 1m(dq)
B
= m(B).

3.1.2 Values of Experiments for Decision Problems

We can compare two signal structures by comparing which one is more valuable for decision
problems. A decision problem is a set of actions A, with state-dependent Bernoulli utility
function u(θ, a). With experiment, a decision maker can improve his decision by making
his choice contingent on each signal realisation and thus his updated belief. Let ū(q, a) =
PN
n=1 qn u(n, a) be the expected utility of action a when your belief is q in ∆. So each
ū(q, a) is a linear function over ∆. Let

W (q) = max ū(q, a) (4)


a∈A

22
be the maximum expected utility that you could get upon each belief q. W (q) is a convex
function over ∆. The reverse is almost true. That is, for any convex function over ∆
for which the sub-differentials are nonempty at each point, say, W (q), we can use convex
duality to express it as a maximum of some linear functions, and thus as an indirect value
of some decision problem like (4).
Let m(dq|E, p) be the distribution over posteriors induced by some experiment E with
prior belief p. Then by the information provided by this experiment, you can achieve
Z
V (p|E) = W (q)m(dq|E, p).

Note that the value when the decision maker makes choices based on his prior belief, that
is, no information available, is W (p). Since W (·) is convex and the mean of m(dq|E, p)
equals to p. we have V (p|E) ≥ W (p) by Jenson’s inequality. That is, experiments help
improving decision. The difference V (p|E) − W (p) is the value of this experiment for the
above decision problem when your prior belief p.

Definition 8. Let E and E 0 be two experiments. We say that E is more valuable than E 0
if for each prior belief p, and each convex function W (q) on ∆, we have
Z Z
W (q)m(dq|E, p) ≥ W (q)m(dq|E 0 , p).
∆ ∆

3.1.3 Sufficiency and Mean-Preserving Spread

Definition 9 (Sufficiency). Given two experiments E = (µ1 , · · · , µN ) with S as signal


space and E = (µ01 , · · · , µ0N ) with S 0 as signal space. We say that E is sufficient for E 0 , if
there is a kernel γ(ds0 |s) which is a distribution on S 0 conditional on each s, such that for
each n, µ0n (ds) = S γ(ds|s)µn (ds).
R

Definition 10 (Mean-Preserving Spread). Let M and m be two distributions over ∆. We


say that M is a mean-preserving spread of m, if we can construct a joint distribution over
∆ × ∆, with (q 1 , q 2 ) as a generic element, such that the marginal distribution of q 2 is m,
and the marginal distribution of q 1 is M , and that the expectation of q 1 conditional on
each q 2 is q 2 .

Proposition 21. Let E and E 0 be two experiments. The following are equivalent:

1. (a) E is more valuable than E 0 ;

23
2. (b) E is sufficient for E 0 ;

3. (c) Let p be in ∆o . m(dq|E, p) is a mean-preserving spread of m(dq|E 0 , p).

Proof. The proof for (c) =⇒ (a) is left to you guys. The proof for (a) =⇒ (c) for the
general case is tricky and omitted. We prove the equivalence between (b) and (c) here.
First let M be sufficient for m. Using the notation in the definition of sufficiency, let q i (si )
be the posterior belief mapping conditional on si . For each n,

E(qn1 |q 2 ) = E(E(Iθ=n |s1 )|q 2 )


= E(E(Iθ=n |s1 , s2 )|q 2 )
= E(Iθ=n |q 2 )
= E(E(Iθ=n |s2 )|q 2 )
= E(qn2 |q 2 )
= qn2 .

The second equality holds because of Remark (??). The third and fourth equality holds
because the information provided by (s1 , s2 ) and s2 is finer than that by q 2 .
Now we prove (c) =⇒ (b). We construct a joint distribution over Θ × ∆1 × ∆2 , with
a generic element denoted by (n, q 1 , q 2 ). Let the joint distribution over (q 1 , q 2 ) be the one
given in the definition of mean-preserving spread. Conditional on each (q 1 , q 2 ), let the
distribution over Θ be q 1 . First, the chance of each state n is given by E(E(Iθ=n |q 1 )) =
E(qn1 ) = pn . Next, because q̂(q 1 ) = q 1 , the distribution of posteriors based on the obser-
vation of q 1 is M . Also, for each n,

E(Iθ=n |q 2 ) = E(E(Iθ=n |q 1 , q 2 )|q 2 )


= E(E(Iθ=n |q 1 )|q 2 )
= E(qn1 |q 2 )
= qn2

That is, the posterior belief conditional on each q 2 is exactly q 2 . Thus the distribution of
posteriors based on the observation of q 2 is m. Finally, by Remark (??), the distribution
over q 2 conditional on (n, q 1 ) depends only on q 1 .

Lemma 4. Let a joint distribution over X × Y × Z be such that the distribution over Z

24
conditional on (X, Y ) depends only on Y . Then the distribution over X conditional on
(Y, Z) depends only on Y

3.2 Lehmann (2012)


Two experiments, which are not comparable in the sense of Blackwell (1951), might be
comparable if we focus on a class of decision problems instead of all decision problems.
Lehmann (2012) focus on monotone decision problems and the class of experiments with
subsets of the reals as signal space and with monotone likelihood ratio property. We focus
on regular experiments, with a generic one denoted by E = (f1 (s), · · · , fN (s)), in which
the signal space S is a closed interval of the reals, each fn (s) is the density function fully
supported on S conditional on state n.
Definition 11. E satisfies monotone likelihood ratio property (MLRP), if for each n0 > n,
fn0 (s)/fn (s) increases in s.
Definition 12. Given two regular experiments E = (f1 (x), · · · , fN (x)) with X as signal
space and E 0 = (g1 (y), · · · , gN (y)) with Y as signal space. We say that E is more effective
than E 0 , if for each x, G−1
n (Fn (x)) is decreasing in n, in which Fn (Gn ) is the (cumulative)
distribution function on X (Y ) conditional on state n in experiment E (E 0 ).
Note that for each state n, G−1
n (Fn (x)) is a mapping from X to Y and that the dis-
−1
tribution of Gn (Fn (x)) is Gn when x is distributed according to Fn . That is, compared
to the garbling of signals in Blackwell (1951), the condition in Lehmann (2012) can be
rephrased as the following: the experiment E 0 is obtained by state-dependent garbling
(draw of y conditional on x and state n) which is reversely ordered w.r.t. state, that is,
the higher the state, the smaller y you will draw conditional on each x.
Example 1 (Location Families). Let x = θ + w and y = θ + v, in which w and v are
noises and x and y are observables. The distribution of w is by F , while the distribution
of v is G, which are assumed to be independent of θ, and supported on R. For each finitely
many states {θ1 , · · · , θN } with θn < θn+1 for each n, the distribution of x conditional on
θn is given by F (x − θn ), while the distribution of y conditional on θn is given by G(y − θn ).
So we have two experiments. For each set of states {θ1 , · · · , θN }, the experiment induced
by x = θ + w is more effective than the one induced by y = θ + v if and only if F is more
dispersed than G, that is, for each t0 > t in [0, 1],

F −1 (t0 ) − F −1 (t) ≥ G−1 (t0 ) − G−1 (t).

25
Definition 13. Given a decision problem with state-dependent utility function u(n, a) on
{1, · · · , N } × A (A ⊆ R is the set of actions), we say that u(n, a) satisfies the dominated
decreasing decision rule (DDDR) condition, if for each state-contingent decision rule d :
{1, · · · , N } → A with d(n) ≥ d(n + 1) for each n, we can find an action ā in A such that
u(n, ā) ≥ u(n, d(n)) for each state n.

That is, a decision problem satisfies DDDR, if each decreasing state-contingent decision
rule is dominated by some constant action.

Proposition 22. Assume that u(n, a) satisfies SCP, or u(n, a) satisfies IDO and is regular.
Then u(n, a) satisfies DDDR.

Proposition 23. Let E = (f1 (x), · · · , fN (x)) with X as signal space and E 0 = (g1 (y), · · · , gN (y))
with Y as signal space be regular. Suppose that E is more effective than E 0 . Then for any
decision problem u(n, a) that satisfies the DDDR condition, and any monotone decision
rule d : Y → A, there exists a decision rule d0 : X → A such that for each state n, the
distribution of u(n, d0 (x)) when x is distributed according to Fn FOSD the distribution of
u(n, a(y)) when y is distributed according to Gn . As a result, for experiment E and with
the decision rule d0 , conditional on each state, the expected payoff ( X u(n, d0 (x))dFn (x))
R

is (weakly) larger than the expected utility with experiment E 0 and the decision rule d
R
( Y u(n, d(y))dGn (y)).

Proof. For each x, d(G−1 −1 −1 −1


n (Fn (x))) is decreasing in n, since Gn (Fn (x)) is decreasing
in n and d(y) is monotone (increasing). Thus there exists some action ā in A such that
u(n, ā) ≥ u(n, d(G−1 −1 0
n (Fn (x)))) for each n. Let d (x) = ā. So for each n and each x, we
have

u(n, d0 (x)) ≥ u(n, d(G−1 −1


n (Fn (x)))). (5)

For each n, the distribution of u(n, d(y)) when y is distributed according to Gn is just the
distribution of u(n, d(G−1 −1
n (Fn (x)))) when x is distributed according to Fn . The desired
result follows by (5).

Theorem 3. Let E = (f1 (x), · · · , fN (x)) and E 0 = (g1 (y), · · · , gN (y)) be regular. Suppose
that E is more effective than E 0 . Assume that E 0 satisfies MLRP. Then for any decision
problem u(n, a) that satisfies SCO. or satisfies IDO and is regular, and any prior belief,
the value of E is larger than that of E 0 .

26
3.3 Persuasion and Costly Information Acquisition
3.3.1 Persuasion Without Cost

Suppose that a principal and an agent share the common prior belief p over Θ. The
principle can design any experiment and then reveal the signal realisation to the agent
truthfully. The agent then chooses a among A based on his updated belief. Let uP (θ, a)
be the utility of principal, and uA (θ, a) the utility of agent. Let

A∗ (q) = arg max ūA (q, a)


a∈A

be the optimal choice of Agent when he has posterior belief q. In case Agents have multiple
optimal choice, we break the tie by assuming that Agent will choose the one that is most
preferred by principal. Let
a∗ (q) ∈ arg max

ūP (q, a).
a∈A (q)

Then the expected utility for principal when both Agent and Principal have posterior
belief q is given by
W P (q) = ūP (q, a∗ (q)).

If an experiment induces m as the distribution over posteriors, then the principal will get
Z
W P (q)dm.

Since the principal is allowed to design any experiment, by Proposition 20, when we focus
on the induced distribution over posteriors, principal’s problem is as below
Z
V (p) = Rmax W P (q)dm.
m: ∆ qdm=p ∆

Note that since actions are chosen by Agent to maximise his own utility, W P (q) is com-
monly not convex in q, which makes the above optimisation problem not trivial. The
following proposition says that V (p) is the smallest concave function among the concave
functions which are above W P (p). This fact gives an immediate and useful characterisation
of the optimal distribution of posteriors in some cases.

Proposition 24. V (p) is the concavification of W P (p) over ∆: 1). V (p) is concave;
and 2). for any concave function g(p) over ∆ with g(p) ≥ W P (p) for each p, we have

27
V (p) ≤ g(p) for each p.

Proof. Easy and omitted.

Example 2. A suspect might be guilty or innocent. Both the Prosecutor and the Jury
initially believe that the suspect is guilty with chance of 0.5 without further evidence. The
Jury will send this suspect to jail if and only if they believe that this suspect is guilty at
chance weakly larger than 0.75. The Prosecutor can design any experiment and reveal the
signal realisation truthfully to the Jury. The Prosector gets utility of 1 if the suspect is
sent to jail and 0 otherwise independent of whether he is guilty or not. Denote the belief
by the chance of being guilty, q ∈ [0, 1]. We have W P (q) = 0 if q < 0.75, and W P (q) = 1
if q ≥ 0.75. The concavification of W P (p) is V (p) = 1/0.75 × p, when p ≤ 0.75; = 1
when p ≥ 0.75. Thus the maximum value for the Prosecutor when the prior belief is
0.5, is 0.5/0.75 = 2/3. This can be achieved by the distribution over posteriors in which
m(0) = 1/3 and m(0.75) = 2/3.

3.3.2 Incorporating costs of information

In the above persuasion model, we can assume that there is a cost of information acqui-
sition. There are two ways to think about cost of information. First, you an think it as
cost of Blackwell experiments, denoted by C̄(E), that is, the domain of the cost function
includes all, or some in case you have constraint, conditional distribution tuples over signal
spaces. On the other hand, you can think cost of information as cost of digested informa-
tion, that is, the cost to get a distribution of posteriors, denoted by C(m). By this way,
the domain of cost functions are all or some distribution over posteriors. We adopt the
latter method. Let M be the set of all distribution of posteriors with p as prior belief,
and c(m) a cost function defined over it. Typtically we assume the following assumptions:
Let δp be the degenerate experiment in M.

1. C(δp ) = 0

2. If m0 be a sufficient for m, then C(m0 ) ≥ C(m).

3. C(m) is convex over M.

The first assumption means that if you keep your prior unchanged, there is not cost
involved. The second assumption says that the more informative, the more costly. The

28
last one is based on that you can randomise between two experiments without extra cost
to get a mixture of the distributions of posteriors.
The literature assumes the following more specific “posterior separable” cost functions:
Let H(p) be a concave function over ∆, and C̃ : R+ → R+ some strictly increasing function
with C̃(0) = 0. Let  Z 
C(m) = C̃ H(p) − H(q)m(dq) .

R
The concave function H(p) measures the uncertainty associated with the belief p. ∆ H(q)m(dq)
R
is the average uncertainty with distribution m. Then H(p) − ∆ H(q)m(dq) is the reduc-
tion of uncertainty due to the information on θ contained in m. The above cost function
requires that the cost of a distribution of posteriors depends on the reduction of uncer-
tainty induced by it in a strictly increasing way. Note that this is very similar to expected
utility representation of the order over M induced by C(m). Intuitively, if you have some
kind of independency axioms on the cost function C(m) plus continuity like I did in my
entropy cost paper, you can have the above cost forms. In the rational inattention litera-
ture, it is typically assumed that H(p) is the Shannon entropy and C̃(x) = cx with c > 0.
The Shannon entropy measure of uncertainty can be characterised by the distraction-free
condition in my paper. With this linear posterior-separable cost function, you can apply
the concavfication trick with nothing changed.
Z  Z 
P
Ṽ (p) = Rmax W (q)dm − c H(p) − H(q)m(dq)
m: ∆ qdm=p ∆ ∆

You may drop −cH(p) part in the above optimisation problem because it does not affect
the optimal solution.
Z
W P (q) + cH(q) dm.
 
V (p) = Rmax
m: ∆ qdm=p ∆

Proposition 24 applies immediately to the above problem.

3.4 Reference
Blackwell (1951) is a paper that everyone should read. Lehmann (2012) discuss the com-
parison of experiments for monotone decision problems. The treatment of Lehmann (2012)
in this note is from Quah and Strulovici (2009). Di Tillio, Ottaviani and Sørensen (2021)
uses Lehmann (2012). Kim (2021) elaborates on Lehmann (2012) and Quah and Strulovici

29
(2009).

4 Observational Learning
4.1 An Example
There are two restaurants A and B in a town. A sequence of agents sequentially and
independently decide which restaurant to eat from, depending on their belief on which one
is better. We use at = A to denote that the agent in period t eats in restaurant A, and
at = B his action of eating in restaurant B. The agent that moves in period t, for each
t = 1, 2, · · · , can observe the choice made by agents proceeding him. Besides that, he also
observes a private signal on which restaurant is better. He updates his belief based on
both the observed choice history and his private signal, and then make a choice based on
his updated belief. Let A denote the state that restaurant A is better, and B the state
that restaurant B is better. Assume that the true state is A. Let the belief p ∈ [0, 1] be
the chance of state A, that is, the chance that restaurant A is better. Suppose that if the
agent has belief p > 0.5, he will choose restaurant A; if his belief is strictly smaller than
0.5, he will choose restaurant B. If his belief is 0.5, he will choose A and B with equal
chance. We suppose that the signals are identical and independent conditional on each
state. To fix the idea, suppose that there are only two signal realisations {sA , sB }. Let the
conditional chance be: π(sA |A) = π(sB |B) = 0.8, and π(sB |A) = π(sA |B) = 02. Let pt be
the public belief of the agent in period t + 1 conditional on his observation of the action
history till period t. Let p0 = 0.5. We are interested in the following questions:

1. How does the public belief process pt evolve? Does it converge? Does it converge in
finite time? Does it converge to the right belief conditional on the true state?

2. How does the action process at evolve? Does herding happen? Is it possible that
agents herd on the wrong action conditional on true state?

In this example, the answer to the above two questions are not hard to see. First note
that the public belief process {pt } is a finite-state Markov chain: the distribution over
pt+1 conditional on (p0 , p1 , · · · , pt ) depends only on pt , and the possible states of pt are
finite: {1/7, 0.2, 0.5, 0.8, 6/7}: p0 = 0.5, p1 (A) = 0.8, p1 (B) = 0.2, p2 (A, A) = 6/7,
p2 (A, B) = p2 (B, A) = 0.5 and p2 (B, B) = 1/7. Conditional on the action history (A, A),
the agent in period 3 will choose a with chance of 1, independent of his private signal.

30
This implies p3 (A, A, A) = p2 (A, A) = 6/7 and P rop(pt+1 = 6/7|pt = 6/7) = 1. Similarly,
P rop(pt+1 = 1/7|pt = 1/7) = 1. That is, 1/7 and 6/7 are two absorbing states, while
all other states are transient. We also get (both conditional and unconditional) transition
matrix by computing the conditional and unconditional chance of each edge in the tree as
discussed in the class.

Definition 14 (Learning). Learning stops in finite time along a sample path of public
beliefs (p0 , p1 , · · · , pt , · · · ) ∈ [0, 1]∞ , if there exists some t̄, such that pt = pt+1 for each
t > t̄. Let P ⊂ [0, 1]∞ be the set of all sample paths of public beliefs along which learning
stops in finite time. If the chance of P is 1, then we say that learning stops in finite time
with chance of 1. Let A be true state. For any convergent belief path (p1 , · · · , pt , · · · ). If
lim pt = 1, we say that learning is complete in the limit along the belief path, otherwise,
learning is not complete along the belief path. If lim pt = 0, we say that learning is
completely wrong along the belief path.

In our example, learning stops in finite time with chance of 1. Because It is a finite-
state absorbing Markov chain, and thus the process will be absorbed in finite time with
chance of 1. (If learning stops in finite time with chance of 1 unconditionally, then so it is
conditional on the true state. Because the conditional distribution over the sample paths
of the public beliefs is absolutely continuous w.r..t. the unconditional distribution. That
is, if chance of P is 1 unconditionally, so it is conditionally. ) Now turn to limiting belief.
In our example, learning is incomplete with chance of 1 conditional on the true state. But
learning is completely wrong almost nowhere conditional on the true state. Now turn to
actions.

Definition 15 (Herding). Herding happens in finite time along a sample path of actions
(at , · · · , at , · · · ), if there exists some t̄, such that at = at+1 for each t > t̄. Let the true
state be A. for a sample path of actions with herding, we say that agents herd on the wrong
action, if lim at 6= A.

In our example, the public belief will be be trapped into 1/7 and 6/7 in finite time with
chance of 1. It is optimal to choose a = A at p = 6/7, and to choose a = B at p = 1/7.
Thus social learning stops at finite time with chance of 1 implies that herding happens in
finite time at chance of 1. Conditional on state A, the limiting belief 1/7 will make agents
herd on the wrong action. We can use the results on distribution over posteriors to get
the chance of herding on wrong action. Let λ be the chance of the limiting belief 1/7, and

31
(1 − λ) the chance of 6/7 in the limiting distribution of the beliefs. Since the prior belief is
0.5, we have λ1/7 + (1 − λ)6/7 = 0.5, and thus λ = 1/2. This is the unconditional chance
of the posterior belief 1/7. Using the relationship between conditional distribution and
unconditional distribution over posteriors, i.e., (??), the chance of posterior 1/7 conditional
on state A is
1/2 × 1/7
= 1/7.
1/2
That is, the chance that agents herd on the wrong action is 1/7 conditional on state A.

4.2 General Model


4.2.1 Main notations and assumptions

1. Two states: θ ∈ {H, L}. Assume that the true state is H. Denote belief by the
chance of H.

2. Assume that signals observed by agents are conditional i.i.d. given by (F H (σ), F L (σ))
over [0, 1] with dF H = 2σdF and dF L = 2(1 − σ)dF , in which F = 1/2F H + 1/2F L .
By Blackwell, we can always transform an experiment with two states into this form.
For each signal realisation σ, the posterior belief associated with σ is exactly σ if
your prior belief is fair. We assume that F H and F L are mutually absolutely contin-
uous. This is equivalent to that 0 and 1 are not atoms under F . Finally, we assume
that the signal is informative, that is, F L 6= F H . This is equivalent to that F is
not degenerate, or equivalently that under F , both (0, 5, 1] and [0, 0.5) have strictly
positive chances

3. Actions: M = {1, · · · , M }. Assume w.l.g and generically that the belief interval
[0, 1] is partitioned into M subintervals, using cut-offs 0 = r0 < r1 < · · · < rM = 1,
such that each action m is optimal when the belief falls into the subinterval Jm =
[rm−1 , rm ], and strictly optimal if the belief is in the interior of Jm .1 To break the
tie, assume that the agent will choose m if his belief falls into (rm−1 , rm ].
1
The belief space under discussion is [0, 1], and so 1 is an interior point of JM , and 0 is an interior
point of J1 .

32
4.3 Bounded and unbounded signals
An important observation in LS is to differentiate signals with bounded power and those
with unbounded power. Let [ā, b̄] be the convex hull of the support of F .

Definition 16. The signal is bounded if we have ā > 0 and b̄ < 1. The signal is unbounded
if we have ā = 0 and b̄ = 1.

To understand the above definition, better focus on likelihood ratio associated with
belief p, denoted by
1−p
l(p) = .
p
Given a prior likelihood ratio 1−pp
(the likelihood ratio associated with prior belief), the
posterior likelihood ratio (the likelihood ratio with the posterior belief) when you observe
signal σ is given by
1−p 1−σ
× .
p σ
When signal is bounded, as long as your prior belief (p) is low enough, i.e., close enough
to 0, no signal realisation can bring your posterior belief close to 1. Also, as long as your
prior belief is high enough, i.e., close enough to 1, no signal realisation can bring your
posterior belief close to 0. This is not the case for unbounded signals. For the latter case,
no matter how high your prior belief is, there are always some signal realisations in the
support of signals which can bring your posterior belief close enough to 0 as wanted.

4.4 Cascade sets


The notion of cascade set is central to the analyse of observational learning. The cascade
set for action m, denoted by Cm , is the set of prior beliefs such that action m is optimal no
matter which signal realisation in the support of signal is observed. For each prior belief
p and signal realisation σ, let r(σ, p) be the posterior belief:

p2σ
r(σ, p) = .
p2σ + (1 − p)2(1 − σ)

Cm := {p ∈ [0, 1] : r(σ, p) ∈ Jm , ∀σ ∈ supp(F )}

Let
C = ∪M
m=1 Cm

33
be the cascade set. Some simple yet important observations:

1. Each Cm is a (perhaps empty) closed subinterval. For any belief p in Cm , it is strictly


optimal to play action m. That is, each Cm is contained in the interior of Jm . As a
consequence, for each m 6= m0 , Cm ∩ Cm0 = ∅.

2. If signals are unbounded, only extreme actions have non-empty cascade set, with
C1 = {0} and CM = {1}.

3. When signal is bounded, For non-extreme actions, i.e., m 6= 1, M , Cm might not


be empty, contained in the open interval (rm−1 , rm ). For extreme actions, we have
C1 = [0, c1 ] with 0 < c1 < r1 and CM = [cM , 1] with 1 > cM > rM −1 .

4.5 Transition Law of pt


For t = 0, 1 · · · , Let pt be the public belief of the agent in period t + 1 conditional on his
observation of the choices by the agents proceeding him, that is, (a1 , · · · , at ), with p0 as
the exogenously given common prior belief. The chance at which he chooses each action
m, conditional on each state θ, is given by

φθ (m, pt ) = F θ (σ(rm , pt )) − F θ (σ(rm−1 , pt )). (6)

in which σ(q, p) is uniquely decided by r(σ(q, p), p) = q. The unconditional chance of


choosing action m by the agent in period t + 1 is then given by

φ(m, pt ) = pt φH (m, pt ) + (1 − pt )φL (m, pt ). (7)

Then the public belief pt+1 of the agent in period t + 2 conditional on his observation of
(a1 , · · · , at , m) in which the agent in period t + 1 chooses m, if the chance is positive, is
given by
pt × φH (m, pt )
pt+1 (m, pt ) = q(m, pt ) := . (8)
φ(m, pt )
(7) and (8) together give the unconditional transition law of the public belief process, while
(6) and (8) together give the conditional transition law of the public belief process.

34
4.6 Results
First, the process {pt } with the unconditional transition law is a martingale, that is,
E(pt+1 |pt ) = pt , and thus it converges almost surely. Let p∞ = lim pt . The following result
t→
says that the support of the limiting beliefs are contained in the cascade set.

Proposition 25. supp(p∞ ) ⊆ C

The next result is on the distribution of limiting beliefs conditional on true state. Let
lt = l(pt ).

Lemma 5. Conditional on state H, the likelihood ratio process {lt } is a martingale, and
thus {lt } converges almost surely conditional on state H

Proof.

M
X (1 − pt ) × φL (m, pt )
EH (lt+1 |lt ) = φH (m, pt ) × ( )
m=1
pt φH (m, pt )
M
X φL (m, pt )
= φH (m, pt ) × lt ×
m=1
φH (m, pt )
M
X
= lt × φL (m, pt )
m=1
= lt .

Learning is completely wrong if the belief converges to 0, or equivalently if the likelihood


ratio explodes. Thus the above Lemma implies:

Proposition 26. Conditional on true state, learning is completely wrong almost nowhere.

Proposition 25 and 26 imply that complete learning is almost sure with unbounded
signals.

Corollary 5. if signal is unbounded, then learning is complete almost surely conditional


on true state.

Proof. With unbounded signal, we have C = {0, 1}. Thus by Proposition 25, limiting
belief is either 0 or 1. By Proposition 26, it equals 1 almost surely conditional on state
H.

35
The case of bounded signals is quite different, especially when we have strict updating
monotonicity:

Definition 17. strict updating monotonicity holds if for each action m, each p0 > p with
φ(m, p) > 0 and φ(m, p0 ) > 0, we have q(m, p0 ) > q(m, p).

A necessary condition for strict updating monotonicity is that signals has no atom,
that is, F is atom-less, which we will assume for Lemma 6 and Corollary 6 below.

Lemma 6. With strict monotonicity of belief updating, if p0 6∈ C, then for each t, pt 6∈ C


almost surely.

Proof. It suffices to prove that for any finite path (p0 , p̄1 , · · · , p̄t−1 , p̄t ) with prob((p0 , p̄1 , · · · , p̄t−1 , p̄t )) >
0, we have p̄t 6∈ C if p̄t−1 6∈ C. Without loss of generality, suppose that p̄t = q(m, p̄t−1 )
with φ(m, p̄t−1 ) > 0. First consider the case that p̄t−1 is strictly smaller than the left
ending point of Cm say p. We have q(m, p) = p. Since p̄t−1 is strictly smaller than p, and
φ(m, p̄t−1 ) > 0, we have
p = q(m, p) > q(m, p̄t−1 ) = p̄t ,

by strict updating monotonicity. That is, p̄t is also strictly smaller than the left ending
point of Cm . Similarly, if p̄t−1 is strictly larger than the right ending point of Cm , so it is
with p̄t . That is, p̄t 6∈ Cm because p̄t−1 6∈ C and thus p̄t−1 6∈ Cm . Then p̄t 6∈ C because
p̄t ∈ Jm . This shows that each p̄t is outside C because p0 6∈ C.

Remark 4. Lemma 6 actually implies that learning would never stop in finite time if the
initial belief is not in cascade set with strict updating monotonicity. This shows that result
that social learning stops in finite time in the example in Subsection 4.1 is not generally
true.

An easy consequence of the above lemma is the following result.

Corollary 6. With strict monotonicity of belief updating, if signal is bounded, and p0 6= 1,


then learning is incomplete almost surely conditional on true state.

Proof. If p0 is in the cascade set, then it will remain unchanged for ever, and thus learning
is incomplete almost surely since p0 6= 1. Now consider the case that p0 6∈ C.
(Step 1) We first show that the chance at which p∞ falls into the interior of C is 0.
Let P be the set of sample paths of beliefs of which the limits fall into the interior of C.

36
Assume the contrary that prob(P ) > 0. Then we can find a sample path

(p̄0 , p̄1 , · · · , p̄t , · · · )

in P such that prob((p̄0 , p̄1 , · · · , p̄t )) > 0 for each t. (To see this, for each t, let P t =
{p ∈ P, prob((p0 , . . . , pt )) > 0} ⊆ P . We have prob(P t ) = prob(P ). Note that {Pt } is a
decreasing sequence. Thus prob(P ) = prob(∩∞ t
t=1 P ) > 0.) However, Lemma 6 implies that
each p̄t in the above sample path is outside C. This is a contradiction (Why).
(Step 2) By Step 1 and Proposition 25, p∞ is finitely supported and its support is
contained in the boundary of C. Since signal is bounded, the boundary of CM is strictly
smaller than 1. Thus p∞ < 1 almost surely.

Now we turn to the action process:

Proposition 27. The following statements hold.

1. Herding in finite time happens almost surely.

2. With unbounded signals, herding on right action happens almost surely.

3. With bounded signals, if the initial belief is not in the cascade set of M , then herding
on the wrong action happens with positive chance.

Proof. (Proof of (1)). (Step 1) Let (a1 , · · · , at , · · · ) be any sample path of action history
such that prob((a1 , · · · , at )) > 0 for each t. If we cannot find some t̄ such that at = at+1
for each t > t̄. Then we can find some m 6= m0 such that for any t̄, there exists some
t > t̄ and t0 > t̄ such that at = m and at0 = m0 Then we have pt (a1 , · · · , at ) ⊆ Jm and
pt0 (a1 , · · · , at0 ) ⊆ Jm0 . Thus the sample path of public belief associated with this sample
path of actions cannot converge to the cascade set.
(Step 2) Now let Ā be the set of sample path of actions in which herding does not
happen in finite time. Suppose the contrary that prop(Ā) > 0. Let Ā0 ⊆ Ā be the set of
sample paths of actions of which each finite truncation has positive chance (similar as the
proof Corollary6). We have prob(Āo ) = prob(Ā) > 0. Let P be the set of sample paths
of public beliefs associated with the sample path of actions in Āo . we have prob(P ) > 0.
But by Step 1, this contradicts Proposition 25. Thus (1) holds. You can mimic the proof
of (1) to give a proof of the rest ones.

37
References
Athey, S. (2002). Monotone comparative statics under uncertainty. The Quarterly Jour-
nal of Economics, 117 (187-223).

Bagnoli, M. and Bergstrom, T. (2005). Log-concave probability and its applications.


Economic theory, 26 (2), 445–469.

Blackwell, D. (1951). Comparison of experiments. Proc. Second Berkeley Symp. on


Math. Statist. and Prob., (63-82).

Caplin, A. and Nalebuff, B. (1991). Aggregation and social choice: a mean voter
theorem. Econometrica, 59 (1-23).

Di Tillio, A., Ottaviani, M. and Sørensen, P. N. (2021). Strategic sample selection.


Econometrica, 89 (2), 911–953.

Karlin, S. and Rinott, Y. (1980). Classes of orderings of measures and related corre-
lation inequalities. i. multivariate totally positive distributions. Journal of Multivariate
Analysis, 10 (4), 467–498.

Kim, Y. (2021). Comparing Information in General Monotone Decision Problems. Tech.


rep., Working paper, Duke.

Lehmann, E. L. (2012). Comparing location experiments. In Selected Works of EL


Lehmann, Springer, pp. 779–791.

Milgrom, P. and Shannon, C. (1994). Monotone comparative statics. Econometrica,


62 (157-180).

Prékopa, A. (1981). Logarithmic concave measures and related topics. Stochastic Pro-
gramming, ed. by M.A.H. Dempster, Academic Press, (63-82).

Quah, J. K.-H. and Strulovici, B. (2009). Comparative statics, infomativeness, and


the interval dominance order. Econometrica, 77 (1949-1992).

Smith, L., Sørensen, P. and Tian, J. (2021). Informational herding, optimal experi-
mentation and contrarianism. Review of Economic Studies, 88 (5), 2527–2554.

Tian, J. (2021). Optimal interval division. Economic Journal.

38

You might also like