Decision Theory Samenvatting

Normative decision theory studies what rational decision
makers ought to do.

Descriptive decision theory tries to explain what decision
makers actually do.
Decisions under ignorance (you don’t know anything
about probability)(Chapter 3)
Decisions under risk (Chapter 4)
• When we describe (or model) a decision situation, we distinguish three levels of abstraction:
1. The decision problem
2. A formalisation of the decision problem (the model which we can analyze)
The model of a decision problem contains at least the following elements:
1. States (rain or no rain)
2. Acts or strategies (going outside or staying inside)
In a one-shot decision model, a decision maker can choose an act from a set 𝐴𝐴 = {𝑎𝑎1, … 𝑎𝑎𝑝𝑝} of
acts. So, an act represents what the agent chooses/does. In a sequential decision model, a decision
maker has to choose a sequence of acts, where the set of acts to choose from at some moment
might depend on the chosen acts in the past. A sequence of acts is called a strategy.
3. Outcomes
An outcome is the result of an act (in a one-shot decision model) or strategy (in a sequential decision
model).
4. Preferences over outcomes
The outcomes can be ordered based on a preference relation: one preferred to the other, indifferent
or not comparable.
5. Information of the decision maker
Given the preferences over the outcomes, the decision maker chooses his/her act or strategy from
the set of acts. This also depends on the information of the decision maker: is it decision making
under certainty, risk or ignorance?
3. A visualisation of the formalization (present the results of the analysis of the model)
Under certain conditions, the preferences over the outcomes can be represented numbers, which we
call utility payoffs. These numbers can represent different scales:
1. Ordinal scale: het gaat om de volgorde, deze vind ik leuker dan die, maar maakt niet uit
hoeveel leuker.
2. Cardinal scales:
- Interval scale: differences have a meaning, ratios do not. ( temperature, fahrenheid)
U = a • v + b (a > 0) U = 2 • 5 + 3 = 13
U = 2 • 7 + 3 = 17
Het verschil tussen 2 cijfers blijft hetzelfde (al dan niet in verhouding), maar de getallen
liggen bijvoorbeeld 30 getallen hoger.
F = 1,8 • C + 32
By solving this equation for C, we get:
C = (F – 32) / 1,8
- Ratio scale: differences and ratios have a meaning.

U=a•v ( a > 0)
V=U/a
In a decision node (knooppunt) (usually represented
by a square) the decision makers can choose from
different acts (the branches (taken) in the tree).
In a chance node (usually represented by a circle) the
branches represent the possibilities and it is not
known which branch will be realized (under risk there
is a probability distribution over the chance nodes).
Rival formalizations of a decision problem arise if two or more formalizations are equally reasonable
and strictly better than all other formalizations.
In decision making under risk, the decision maker knows the probability of the possible outcomes. In
decision making under ignorance, the decision maker does not know the probability of the possible
outcomes (or these probabilities do not exist).
a𝑖 is a strongly dominant act if a𝑖≻ a𝑗 for all a𝑗 en a𝑖.

Weakly dominant: at least as good as
Strongly dominated: a𝑖≻ a𝑗, a𝑗 is dominated.
Weakly dominated
Maximin: maximize the lowest number of each action (worst outcome). If there are 2 maximins:
leximin: maximize the second lowest number.  pessimistic decisionmaker
Maximax: maximize the best outcome.  optimistic decisionmaker
Optimism-pessimism rule: maximum a𝑖 – minimum a𝑖 (you choose the one with biggest outcome).
𝛼max(𝛼𝑖) + (1 − 𝛼)min(𝛼𝑖)
Minimax regret: Choose the act that minimizes the maximal regret (if another state happened). So
choose the maximum for every state (column), and do every number in that column minus the
maximum in that column (will be ≤ 0). Than choose the act (row) that will give you the highest
minimum.
Principle of insufficient reason: bereken de gemiddelde waarde van alle opties (ga ervan uit dat elke
optie een gelijke probability heeft, want insufficient) en kies de hoogste waarde. boot = (3+2+6) / 3
Randomised acts: choose random, by playing once, makes no sense, by playing more often you will
get an average result ½ • €100.
Axiomatic analysis: The different criteria that we considered for decision making under uncertanity
can be evaluated by an axiomatic analysis. An axiomatization of a rule is a set of axioms that are only
satisfied by this rule.
We speak about decision making under risk if the probability distribution over the states is known by
the decision maker. In that case, the decision maker can make his/her decision based on the
expected value. But what are we maximizing? If the values are monetary payments then we
maximize expected monetary value. If the values are utility payoffs then we maximize expected
utility. Risk attitudes: risk seeking, -neutral (expected utility equal) and -averse.
EMV = ½ • 49 + ¼ • 25 + ¼ • 25 = 37, de kans • de opbrengst (more risk seeking)
EU is the same way.
---- risk seeking

(i) the strict preference relation
• 𝑥≻𝑦 if and only if [𝑥≽𝑦 and ¬y≽𝑥]
(ii) the indifference relation
• 𝑥~𝑦 if and only if [𝑥≽𝑦 and 𝑦≽𝑥]
A preference relation is complete (vNM1) if two alternatives are comparable.
Asymmetry: if x ≻ y, than it’s false that y ≻ x.
Transivity (vNM2): if x ≻ y, and y ≻ z, than x ≻ z.
Negative transivity:¬𝑥≻𝑦 and ¬𝑦≻ z, than ¬𝑥≻𝑧
Preference relation ≻ can be represented by a utility function u if and only if ≻ is complete,
asymmetric and negatively transitive in 𝑋.
A new axiom: Preference relation ≻ satisfies
independence (vNM3) if and only if for al a,b,c holds that
A ≻ 𝐵 if and only if 𝐴pC ≻ 𝐵pC.
In words, if you prefer lottery 𝐴 to lottery 𝐵 then you
prefer any lottery between 𝐴 and a third lottery 𝐶 to the
lottery between 𝐵 and 𝐶 with the same probabilities p.
Continuity (vNM 4) If A ≻ B ≻ C then there exist some p
and q such that
ApC ≻ B ≻ AqC
You prefer A (€10m with a p close to 1) and C (€0 with p = 1 – p) over B (€9m), but you prefer B over
A (witch p close to 0) and C (p close to 1).
Theorem 5.2 Preference relation ≻ satisfies vNM 1–4 if and only if it can be represented by a utility
function u satisfying:
(i) 𝐴 ≻ 𝐵 if and only if 𝑢(𝐴) > 𝑢(𝐵)
(ii) 𝑢(𝐴) = 𝑝(𝐴) + (1 − 𝑝)𝑢(𝐵).
(iii) For every other function 𝑢 satisfying (i) and (ii),there are numbers 𝑐 > 0 and 𝑑 such that
𝑢 = 𝑐 · 𝑢 + 𝑑. (a function that satisfies (i) and (ii), can be obtained from every other such
function by multiplying the latter by a constant and adding another constant).
Objections to vNM:
- The axioms are to strong, can never be satisfied.
- No action guidance. By definition, a rational decision maker who is about to choose
among a large number of very complex acts has to know already from the beginning
which risky act to prefer. This follows from the completeness axiom. But than it’s not
possible to first calculate the expected utilities and thereafter choosing an act that has
the highest expected utility.
Reply: someone who doesn’t have a complete preference can first have some help from
the axioms (but: nonideal agent).
- Utility without chance. It seems rather odd from a linguistic point of view to say that the
meaning of utility has something to do with preferences over lotteries.
Reply: a claim that it is about how to measure utility.
Game theory
Outcomes: for each set of actions (or strategies).
Payoff: the utility a player receives.
- In zero-sum games, in every outcome of the game the sum of all payoffs is equal to zero.
These games reflect strong competition.
- In nonzero-sum games different outcomes of the game can have different total sum of
payoffs.
- Noncooperative games versus cooperative games (binding agreement during pre-play
negotiations, n.v.t.)
- Simultaneous-move games versus sequential-move games
- Games with perfect information versus games with imperfect information
- Symmetric games versus nonsymmetric games
- Two-person games versus n-person games
- Iterated (herhaaldelijk) games versus non-iterated games
Strategies: actions
Strategy profiles: outcomes of the game
𝑠 = (𝑠1, 𝑠2, … , 𝑠𝑛) is called a strategy profile. It consists of 𝑛 strategies, one for each player.
A strategy profile S is Pareto dominated if there exists a strategy profile S′ such that U𝑖 S’ > U𝑖 S. A
strategy profile is Pareto optimal if it is not Pareto dominated. The Prisoner Dilemma shows that the
strategy profile that results if all players play a dominant strategy need not be Pareto optimal.
You also have a strictly dominant and strictly dominated strategy. A weakly dominant strategy gives
at least the same payoff as all other strategies and at least one payoff higher than all other strategies.
Let op: the order of elimination matters by iterative elimination of weakly dominated strategies.
Therefore, usually we do not do that. Moreover you can eliminate Nash equilibria.
A strategy profile is a (pure) Nash equilibrium if and only if it holds that once every player choses its
strategy, then none of the players could reach a better outcome by unilaterally (eenzijdig) switching
to another strategy (niemand kan erop vooruitgaan zonder dat er iemand slechter van wordt).
Note that a Nash equilibrium is a strategy profile, while a dominant (dominated strategy) is a
strategy for some player. A (pure) Nash equilibrium is always one of the pure strategy profiles which
is left after iterative elimination of (pure) strictly dominated strategies. One nash equilibrium: (R1,C3)
A mixed strategy for a player is a probability distribution over its pure strategies. Each player will
choose a mixed strategy in response to the mixed strategies of all other players that maximizes
his/her own expected payoff. The result is called a mixed Nash equilibrium.
𝑝 describes a mixed strategy for player Row, Player Col C1 C2

Player Row
and 𝑞 describes a mixed strategy for player Col. p (kans dat R R1 speelt)
R1 (3,5) (0,4)
R2 (1,0) (20,30) (1-p)
Row: any p between [0,1] is a best response when: q (1-q)
Row is indifferent between R1 and R2.
EU (R1) = EU (R2) (R druk je uit in q omdat welke pay off je gaat hebben afhangt van de actie van column)
3q + 0 • (1-q) = 1 • q + 20 (1-q)
q = 10/11 when C plays C1 with a probability of 10/11, any p is the best response for R.
Column: any q between [0,1] is a best response when:
Column is indifferent between C1 and C2.
C1 = C2
5p + 0 • (1-p) = 4p + 30 • (1-p)
p = 30/31 when R plays R1 with a probability of 30/31, any q is the best response for C.
Unique (mixed) Nash Equilibrium strategy (p,q) = (30/31, 10/11)
A social choice problem consists of a set of individual decision makers, and for each decision maker a
preference relation. We distinguish two types of preference aggregation (ophoping):
- A Social Choice Function assigns to every social choice problem one or more alternatives
(keuzes) which can be considered as the alternatives that are chosen by the society.
 The best, the top (= most of the time one, but with a tie can be more)
- A Social Welfare Function assigns to every social choice problem one preference relation
which can be seen as the ‘social preference relation’  A ranking
A social choice problem is a triple (𝑁, 𝐴, 𝐺) where
• 𝑁 is a finite set of agents
• 𝐴 is a finite set of alternatives, and
• 𝐺 = (≽𝑖)𝑖∈𝑁 is a preference profile (with ≽𝑖 is a preference relation on 𝐴, for 𝑖 ∈ 𝑁).
Two main questions:

- How do/should the agents choose one alternative together for the whole society? (Social
choice function)
- Is it possible to derive a social preference relation reflecting the preferences of the
society as a whole? (Social welfare function)
-
Two viewpoints that have been taken in the literature are:
• Cooperative viewpoint where a benevolent dictator tries to do what is ‘best’ for society
• Strategic viewpoint where, by voting, agents can strategically manipulate the voting outcome.
A social choice function 𝐶 assigns to every preference profile 𝐺 a subset of the set of alternatives 𝐴:
𝐶(𝐺)⊆𝐴. The set 𝐶(𝐺) is called the social choice set associated to preference profile 𝐺.
A social welfare function 𝐹 assigns a preference relation to every social choice situation.
Two categories:
1. Scoring functions (Borda)
• The plurality score of alternative 𝑎 ∈ 𝐴 is the number of agents
that have alternative 𝑎 as (one of) their most preferred
alternative(s). Scf chooses the alternatives that are best for the
highest number of agents. The plurality choice set is the set of
alternatives that are best for the most number of agents. The
alternatives in this set are called the plurality winners.
• The antiplurality scf chooses the alternatives that are worst for
the lowest number of agents. The antiplurality score of an
alternative is the number of agents that have this alternative as
(one of) their worst alternative(s). The antiplurality choice set is
the set of alternatives that are worst for the lowest number of
agents. The alternatives in this set are called the antiplurality
winners.
Main disadvantages scf are:
1. It only takes account of the most preferred alternative of every agent and ignores the rest of
the preferences.
2. It is very sensitive to strategic manipulation.
• Borda scf: to overcome the disadvantages of considering only the
best or worst alternative in each preference relation, each agent can
assign points to all alternatives, and the ‘winner’ is the alternative
with the highest number of points when summing over all agents
(highest borda score). The Borda choice set is the set of alternatives
with the highest total Borda scores. The alternatives in this set are
called the Borda winners.
• Borda swf: The Borda social welfare function is obtained by ordering the b ≻B a ≻B c ≻B d
alternatives according to their total Borda score, that is the higher the total
Borda score, the higher ranked is the alternative.
Different scoring scf's may lead to different choices.
2. Majoritarian functions (Condorcet)

𝑎 ≽G b a is preferred over b by at least as much agents as b is
preferred over a.
Which social choice function or social welfare function is the ‘best’? We try to find out which social
choice (welfare) function is desirable by finding properties of social choice (welfare) functions. An
axiomatization of a social choice (welfare) function is a set of properties that characterizes one
unique social choice (welfare) function.
Properties of social welfare functions:

A social welfare function satisfies non-dictatorship if and only if no single individual is decisive
(beslissend, crucial).
A social welfare function satisfies ordering if and only if for every possible combination of individual
preference relations, the social preference relation is complete (je hebt een voorkeur), asymmetric (if
x ≻ y, it’s false that y ≻ x) and transitive (if x ≻ y, and y ≻ z, than x ≻ z).
A social welfare function satisfies Pareto efficiency if and only if the group of all individuals in society
is decisive.
A social welfare function satisfies independence of irrelevant alternatives (IIA) if and only if all
individuals having the same preference between a and b in two different preference profiles 𝐺 and
𝐺’, implies that society’s preference between 𝑎 and 𝑏 must be the same in 𝐺 and 𝐺’ (the preferences
between c and d are than irrelevant).
Theorem (Arrow's impossibility theorem): if there are at least three alternatives, then no Social
Welfare Function satisfies independence of irrelevant alternatives, Pareto efficiency, non-
dictatorship and the ordering condition.
Corollary: if there are at least three alternatives, then every Social Welfare Function that satisfies
Pareto efficiency, the ordering condition and IIA, must be dictatorial.
A social welfare function satisfies minimal liberalism if and only if there are at least two individuals in
society such that for each of them there is at least one pair of alternatives with respect to which she
is decisive, that is, there is a pair a and b, such that if she prefers a to b, then society prefers a to b
(and society prefers b to a if she prefers b to a).
Theorem: there is no Social Welfare Function that satisfies minimal liberalism, Pareto efficiency and
the ordering condition.
Properties of social choice functions:

Assumption: We assume the social choice function to be single-valued, that is, to every social choice
problem it assigns a unique choice
Agent i has a successful manipulation if by ‘misreporting’ his/her preferences (i.e. stating a different
preference relation, than its real preference relation) while the other agents do not change their
preference relation, the social choice is better for agent 𝑖.
A social choice function is strategy-proof if for every preference profile there is no agent who has a
successful manipulation. So, a social choice function is strategy-proof if misreporting is never
beneficial for any agent.
A social choice function 𝐶 is dictatorial if there is always an individual agent whose unique best
element is always the social choice.
Theorem: if there are at least three alternatives, then there is no Social Choice Function that is
strategy-proof and nondictatorial.
Corollary: if there are at least three alternatives, then every strategy-proof social choice function is
dictatorial.

Decision Theory Samenvatting

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Decision Theory Samenvatting

Uploaded by

Copyright:

Available Formats

Normative decision theory studies what rational decision

makers ought to do.

- Ratio scale: differences and ratios have a meaning.

a𝑖 is a strongly dominant act if a𝑖≻ a𝑗 for all a𝑗 en a𝑖.

---- risk seeking

𝑝 describes a mixed strategy for player Row, Player Col C1 C2

Two main questions:

2. Majoritarian functions (Condorcet)

Properties of social welfare functions:

Properties of social choice functions:

You might also like