Cheatsheet RANDOM FOREST EXPLAINED & IMPLEMENTED IN PYTHON WHAT IS THE RANDOM FOREST ALGORITHM? #Implementation
• An ensemble tree based algorithm. It consists of a import pandas as pd
set of decision trees that are randomly selected from a subset of the training data. df = pd.read_csv(‘data.csv’)
• Final class of the testing data point is selected on X = df.drop(‘class’,axis = 1)
the basis of aggregate votes from other decision Y = df[[‘class’]] trees. from sklearn.model_selection import train_test_split • Highly accurate algorithm that can even work with X_train,X_test,y_train,y_test = missing values. train_test_split(X,y,test_size=0.33,random_state=42)
• It can be used for both classification as well as from sklearn.ensemble import
• Overfitting in models results in poor performance #Classification
of the model but in case of random forest it will not overfit if there are many trees. rfcl = RandomForestClassifier() rfcl.fit(X_train,y_train) y_pred = rfcl.predict(X_test) HOW DOES IT WORK? accuracy_score(y_pred,y_test)
• Choose random samples from the respective #Regression
dataset. rfr = RandomForestRegression() • Generate decision trees for every sample and rfr.fit(X_train,y_train) check prediction results from every decision tree. y_pred = rfcl.predict(X_test) accuracy_score(y_pred,y_test) • Calculate votes for every decision tree and pick the prediction result that has max votes as the final class prediction.