Professional Documents
Culture Documents
Decision Tree
Decision Tree
Implementation:
General Terms: Let us first discuss a few statistical concepts used in this post.
Entropy: The entropy of a dataset, is a measure the impurity, of the dataset Entropy can also be
thought, as a measure of uncertainty. We should try to minimize, the Entropy. The goal of
machine learning models is to reduce uncertainty or entropy, as far as possible.
Information Gain: Information gain, is a measure of, how much information, a feature gives us
about the classes. Decision Trees algorithm, will always try, to maximize information gain.
Feature, that perfectly partitions the data, should give maximum information. A feature, with the
highest Information gain, will be used for split first.
uploaded = files.upload()
import shutil
import os
['.config',
'diabetes (1).csv',
'diabetes.csv',
'diabetes11.csv',
'sample_data']
import pandas as pd
Like what you see? Visit the data table notebook to learn more about interactive tables.
Distributions
2-d distributions
Time series
import numpy as np
if len(unique_data) <= 1:
return unique_data[0]
elif len(dataset) == 0:
return unique_data[np.argmax(datum[1])]
elif len(features) == 0:
return parent
else:
parent = unique_data[np.argmax(datum[1])]