You are on page 1of 1

ANALYTICS INDIA MAGAZINE

Cheatsheet
RANDOM FOREST EXPLAINED
& IMPLEMENTED IN PYTHON
WHAT IS THE RANDOM FOREST ALGORITHM? #Implementation

• An ensemble tree based algorithm. It consists of a import pandas as pd


set of decision trees that are randomly selected
from a subset of the training data. df = pd.read_csv(‘data.csv’)

• Final class of the testing data point is selected on X = df.drop(‘class’,axis = 1)


the basis of aggregate votes from other decision Y = df[[‘class’]]
trees.
from sklearn.model_selection import train_test_split
• Highly accurate algorithm that can even work with X_train,X_test,y_train,y_test =
missing values. train_test_split(X,y,test_size=0.33,random_state=42)

• It can be used for both classification as well as from sklearn.ensemble import


regression tasks. RandomForestClassifier,RandomForestRegressor

• Overfitting in models results in poor performance #Classification


of the model but in case of random forest it will
not overfit if there are many trees. rfcl = RandomForestClassifier()
rfcl.fit(X_train,y_train)
y_pred = rfcl.predict(X_test)
HOW DOES IT WORK? accuracy_score(y_pred,y_test)

• Choose random samples from the respective #Regression


dataset.
rfr = RandomForestRegression()
• Generate decision trees for every sample and rfr.fit(X_train,y_train)
check prediction results from every decision tree. y_pred = rfcl.predict(X_test)
accuracy_score(y_pred,y_test)
• Calculate votes for every decision tree and pick the
prediction result that has max votes as the final
class prediction.

www.analyticsindiamag.com

You might also like