You are on page 1of 1

Random forests or random decision forests are an ensemble learning method for classification, regression and

other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is
the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Random Forest is a supervised learning algorithm. Like you can already see from it's name, it
creates a forest and makes it somehow random. ... One big advantage of random forest is, that it
can be used for both classification and regression problems, which form the majority of current
machine learning systems.

Random forest runtimes are quite fast, and they are able to deal with unbalanced and missing
data. Random Forest weaknesses are that when used for regression they cannot predict beyond
the range in the training data, and that they may over-fit data sets that are particularly noisy.

The training algorithm for random forests applies the general technique of bootstrap aggregating, or
bagging, to tree learners. Given a training set X = x1, ..., xn with responses Y = y1, ..., yn, bagging
repeatedly (B times) selects a random sample with replacement of the training set and fits trees to these
samples:
For b = 1, ..., B:
1. Sample, with replacement, n training examples from X, Y; call these Xb, Yb.
2. Train a classification or regression tree fb on Xb, Yb.

After training, predictions for unseen samples x' can be made by averaging the predictions from all
the individual regression trees on x':

or by taking the majority vote in the case of classification trees.

You might also like