You are on page 1of 4

Lab 2

Muhd Fakhrullah 5B

1) It involves of cleaning the data that is irrelevant, duplicated, incomplete or need to convert
the data to numerical values if the data is text-based. The consequence of not cleaning your
data is our model will learn bad pattern and produce the wrong result

2) Training data – a subset to train a model (usually 80%)


Test data – a subset to test the trained model (usually 20%)
The reason we need to separate these two is to make sure machine learning accurately
measuring the data

3) Skikit-Learn – these libraries will ease our work by explicitly program an algorithm

4) Theano – act as an optimizing compiler for evaluating and manipulating mathematical


expression and matrix calculation

5) It provide utilities for saving and loading Phyton object efficiently


1) import pandas as pd

df = pd.read_csv('vgsales.csv')

df

2) df.describe()

3) Input

import pandas as pd

music_data = pd.read_csv('music.csv')

X = music_data.drop(columns=['genre'])

Output

import pandas as pd

music_data = pd.read_csv('music.csv')

X = music_data.drop(columns=['genre'])

Y = music_data['genre']

4,5,6 ) import pandas as pd

from sklearn.tree import DecisionTreeClassifier

music_data = pd.read_csv('music.csv')

X = music_data.drop(columns=['genre'])

Y = music_data['genre']

model = DecisionTreeClassifier()

model.fit(X, Y)

predictions = model.predict([ [21, 1], [22, 0] ])

predictions
7) import pandas as pd

from sklearn.tree import DecisionTreeClassifier

import joblib

music_data = pd.read_csv('music.csv')

X = music_data.drop(columns=['genre'])

Y = music_data['genre']

model = DecisionTreeClassifier()

model.fit(X, Y)

joblib.dump(model, 'music-recommender.joblib')

8) import pandas as pd

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

music_data = pd.read_csv('music.csv')

X = music_data.drop(columns=['genre'])

Y = music_data['genre']

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size =0.2)

model = DecisionTreeClassifier()

model.fit(X_train, Y_train)

predictions = model.predict(X_test)

score = accuracy_score(Y_test, predictions)

score

You might also like