Random Forest Model Outperforms Legal Experts

Random Forest
• Designed to further enhance prediction accuracy of CART
• Works by building a large number of CART trees
• But makes model less interpretable
• Tradeoff: Interpretability or Accuracy
• To make a prediction for a new observation, each tree "votes" on the

outcome, and we pick the outcome that receives the majority of the votes
31
Building many trees
1. Each tree can split on only a random subset of the variables
2. CART does not include randomness while Random Forests have.
3. Each tree is built from a "bagged"/"bootstrapped" sample of the data

1. Select observations randomly with replacement
2. Example – original data: 1 2 3 4 5
3. New "data” (3 different trees):
1. 23125
2. 31451
3. 44215
32
Random Forest parameters
•Minimum number of observations in a subset (like minbucket in CART)
•In R, this is controlled by the parameter nodesize
•Smaller value of nodesize, leads to bigger trees, take longer in R.
•Random Forests is much more computationally intensive than CART.
•Number of trees
•In R, this is the parameter ntree
•Should not be too small, because bagging procedure may miss observations
•More trees take longer to build
•Default parameter settings are typically okay
33 •Not that sensitive to the parameter values

Parameter selection (CART)
•In CART, the value of ’minbucket’ can affect the model’s out-of-sample accuracy
•if minbucket is too small, over-fitting might occur
•if minbucket is too large, the model might be too simple
•How should we set this parameter?

K-fold cross-validation
•Given training set, split into k pieces ("folds")
•Use k−1 folds to estimate a model, and test model on remaining one fold
("validation set") for each candidate parameter value
•Repeat for each of the k folds

K-fold cross-validation
•Given training set, split into k pieces ("folds")
•Use k−1 folds to estimate a model, and test model on remaining one fold
("validation set") for each candidate parameter value
•Repeat for each of the k folds
•For each candidate parameter value (ex. minbucket), average accuracy

over the k folds, or validation sets
Cross-Validation in R
•Before, we limited our tree using minbucket
•When we use cross-validation in R, we’ll use a parameter called cp

instead
•Complexity Parameter
•Like Adjusted
•Measures trade-off between model complexity and accuracy on
the training set
•Smaller cp leads to a bigger tree (might overfit)

Martin’s Model
•Used 628 previous SCOTUS cases between 1994 and 2001
•Made predictions for the 68 cases that would be decided in October

2002, before the term started
•Two stage approach based on CART:

• First stage: one tree to predict a unanimous liberal decision, other
tree to predict unanimous conservative decision
• If conflicting predictions or both predict no, move to next stage
• 50% of Supreme Court cases result in a unanimous decision
• Second stage consists of predicting decision of each individual

justice, and using majority decision as prediction
Tree for Justice O’Connor
Tree for Justice Souter
An unusual property of the CART trees that Martin and his colleagues developed
They use predictions for some trees as independent variables for other trees. (The first
split is whether or not Justice Ginsburg predicted decision is liberal.)
In summary, liberal – liberal – Affirm (liberal); Conservative – conservative – Affirm

(conservative).
The experts
•Martin and his colleagues recruited 83 legal experts

• 71 academics and 12 attorneys
• 38 previously clerked for a Supreme Court justice, 33 were chaired
professors and 5 were current or former law school deans
•Experts only asked to predict within their area of expertise; more than one
expert to each case
•Allowed to consider any source of information, but not allowed to

communicate with each other regarding predictions
The results
•68 cases in October 2002
•Overall case predictions:

• Model accuracy: 75%
• Experts accuracy: 59%
•Individual justice predictions:

• Model accuracy: 67%
• Experts accuracy: 68%
• For some justices, the model performed better. And for some justices, the experts
performed better.
Individual Justice Predictions
Takeaway messages
•Predicting Supreme Court decisions is very valuable to firms, politicians

and non-governmental organizations
•A model that predicts these decisions is both more accurate and faster
than experts
• CART model based on very high-level details of case beats experts
who can process much more detailed and complex information

Random Forest Model Outperforms Legal Experts

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Random Forest Model Outperforms Legal Experts

Uploaded by

Copyright:

Available Formats

Random Forest

• Designed to further enhance prediction accuracy of CART

• Works by building a large number of CART trees

• But makes model less interpretable

• Tradeoff: Interpretability or Accuracy

• To make a prediction for a new observation, each tree "votes" on the

2. CART does not include randomness while Random Forests have.

3. Each tree is built from a "bagged"/"bootstrapped" sample of the data

•Default parameter settings are typically okay

33 •Not that sensitive to the parameter values

•How should we set this parameter?

•Given training set, split into k pieces ("folds")

•Repeat for each of the k folds

•Given training set, split into k pieces ("folds")

•Repeat for each of the k folds

•For each candidate parameter value (ex. minbucket), average accuracy

•When we use cross-validation in R, we’ll use a parameter called cp

•Smaller cp leads to a bigger tree (might overfit)

•Used 628 previous SCOTUS cases between 1994 and 2001

•Made predictions for the 68 cases that would be decided in October

•Two stage approach based on CART:

• Second stage consists of predicting decision of each individual

In summary, liberal – liberal – Affirm (liberal); Conservative – conservative – Affirm

•Martin and his colleagues recruited 83 legal experts

•Allowed to consider any source of information, but not allowed to

•68 cases in October 2002

•Overall case predictions:

•Individual justice predictions:

•Predicting Supreme Court decisions is very valuable to firms, politicians

You might also like