Welcome to Scribd!

Step 1: Install Required Libraries

Uploaded by

0% found this document useful (0 votes)

17 views3 pages

The document instructs to take 80% of a dataset for training a multinomial Naive Bayes model with TFIDF features (50K vocabulary) and report accuracy metrics on the remaining 20% test set. The summary should: 1) Split the dataset into 80% for training and 20% for testing 2) Learn a Naive Bayes model on training data using TFIDF features 3) Report precision, recall, and F1 scores on the test set for classification problem.

Original Description:

Original Title

Q6 report

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

17 views3 pages

Step 1: Install Required Libraries

Uploaded by

Lakshmi Harshitha Yechuri

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 3

Search inside document

6. Use the entire dataset.

Take the first 80% dataset for train and remaining 20% for test. On
the train set, obtain TFIDF features (with 50K vocabulary) and learn a multinomial Naïve
Bayes model. Report the accuracy on the test set for this five-class classification problem.
Accuracy should be reported as class-wise precision, recall and F1. Submit q5.py. [10 marks]

Step 1: Install required libraries

- For the dataframe

i. Pandas

- For machine learning model

i. sklearn.feature_extraction.text -> TfidfVectorizer (creates the TFDIF vector)

ii. sklearn.naive_bayes -> MultinomialNB (for naïve bayes model)

iii. sklearn.pipeline -> make_pipeline (to create a pipeline of forementioned)

iv. sklearn.model_selection -> train_test_split (to split the data)

v. sklearn -> metrics (to compute the accuracy metrices like precision and recall)

vi. sklearn.metrics ->

confusion_matrix,accuracy_score,roc_auc_score,roc_curve,auc,f1_score

- For visual representations

i. seaborn

ii. matplotlib.pyplot

Step 2: Import the forementioned libraries

- Once the libraries are installed, they have to be imported in order for us to use them.

Step 3: Import the json file and split the data

- Place the input file in the source path location and read the data using pandas read json
function.

- Apply the train test split function on the dataset in order to proceed with ML model. This
step creates 4 variables

i. x_train – the training set independent variable

ii. x_test – testing set independent variable

iii. y_train – the training set predictor variable

iv. y_test – testing set predictor variable

Step 4: Create the model pipeline; train and test the model.

- Use the make pipeline function and create a pipeline of TFIDF vectorizer function and the
Multinomial naïve bayes function.

- Add an argument ‘max features’ in the TFDIF function in order to limit the vocabulary to 50k.

- Apply model.fit function on the pipeline. This is to train the model. Hence, we use the
training dataset.

- Use the model on the test dataset. Predicted variables are stored in variable ‘label’

Step 5: Create the confusion matrix

- The confusion matrix shows us the predicted of the test set (label) vs what should have been
the prediction (y_train). It helps us to visualise how accurately the model is predicting.

- A heatmap of label vs y_train will help us create the confusion matrix.

Step 6: Computing the metrices

- The metrics package can be used to calculate the precision, recall and F1.

- Classification report of the metrics functions gives us the required numbers.

Output:

Confusion matrix

Metrices

Anomaly Detection in Images CIFAR-10
Document9 pages
Anomaly Detection in Images CIFAR-10
Mallikarjun patil
No ratings yet
E4 DS203 2023 Sem2
Document2 pages
E4 DS203 2023 Sem2
sparee1256
No ratings yet
FashionClothingClassification
Document10 pages
FashionClothingClassification
Captain Mike
No ratings yet
SENTIMENT ANALYSIS ON TWEETS
Document2 pages
SENTIMENT ANALYSIS ON TWEETS
vikibytes
No ratings yet
Assignment 2
Document3 pages
Assignment 2
vedantsimp
No ratings yet
CSC 645 Report on SVM Model
Document3 pages
CSC 645 Report on SVM Model
Muhd Fakhrullah
No ratings yet
Important Questions
Document4 pages
Important Questions
Adilrabia rsl
No ratings yet
Lab 07
Document2 pages
Lab 07
6technoviper9
No ratings yet
AI LAB EX 3
Document2 pages
AI LAB EX 3
21csa48
No ratings yet
Big Data Machine Learning Lab 4
Document7 pages
Big Data Machine Learning Lab 4
fahim.samady2001
No ratings yet
Lab-5A: Regression Analysis and Modeling
Document2 pages
Lab-5A: Regression Analysis and Modeling
Vishal Ramina
No ratings yet
CS 475/675 Machine Learning: Homework 3 Visualizations
Document8 pages
CS 475/675 Machine Learning: Homework 3 Visualizations
Ali Zain
No ratings yet
Smai A1 PDF
Document3 pages
Smai A1 PDF
Zubair Ahmed
No ratings yet
Lab W7
Document4 pages
Lab W7
ARINA SYAKIRAH MUHAIYUDDIN
No ratings yet
File 482621234 482621234 - Assignment 2 - 7378831553794248
Document5 pages
File 482621234 482621234 - Assignment 2 - 7378831553794248
Bob Philip
No ratings yet
Assignment 5 - NN
Document4 pages
Assignment 5 - NN
thecoolguy96
No ratings yet
Step-by-step guide to CPU-based retinal vessel segmentation
Document3 pages
Step-by-step guide to CPU-based retinal vessel segmentation
Mircea
No ratings yet
Chapter 10 - Interface Python With MySQL
Document7 pages
Chapter 10 - Interface Python With MySQL
Jess
No ratings yet
Team Alacrity - Amazon ML Challenge 2023 - Text File
Document8 pages
Team Alacrity - Amazon ML Challenge 2023 - Text File
omkar sameer chaubal
No ratings yet
IE506 Challengequestion
Document2 pages
IE506 Challengequestion
Nitish Goel
No ratings yet
Lab1 BoW ImageClassification
Document3 pages
Lab1 BoW ImageClassification
Vikramaditya Tarai
No ratings yet
Algorithm & Flowchart for Deepfake
Document3 pages
Algorithm & Flowchart for Deepfake
shruthi s
No ratings yet
Data Science & Data Analytics Lab Project CS695A: Datasets: (Source
Document2 pages
Data Science & Data Analytics Lab Project CS695A: Datasets: (Source
Sagnik Chatterjee
No ratings yet
Instructions for Data Analysis and Modeling Using Python
Document1 page
Instructions for Data Analysis and Modeling Using Python
Brianearl
No ratings yet
WEKA Tutorial: Machine Learning with Datasets
Document4 pages
WEKA Tutorial: Machine Learning with Datasets
aditi1687
No ratings yet
MACHINE LEARNING LABORATORY
Document3 pages
MACHINE LEARNING LABORATORY
Ank Ith G
No ratings yet
CNN Implementation in Python
Document7 pages
CNN Implementation in Python
Muhammad Usman
No ratings yet
ML Lab Manual
Document38 pages
ML Lab Manual
Rahul
No ratings yet
Simple Linear Regression - Assign2
Document9 pages
Simple Linear Regression - Assign2
Sravani Adapa
No ratings yet
NNProject t2
Document9 pages
NNProject t2
ayten55zoweil
No ratings yet
VINEELA ANN1
Document9 pages
VINEELA ANN1
vineela
No ratings yet
Exercise Introduction
Document1 page
Exercise Introduction
aimen.nsiali
No ratings yet
Lab Assignment (Linear Regression)
Document2 pages
Lab Assignment (Linear Regression)
Rana Babar
No ratings yet
2324 BigData Lab3
Document6 pages
2324 BigData Lab3
Elie Al Howayek
No ratings yet
Classification in R
Document5 pages
Classification in R
Aman Kansal
No ratings yet
Digital Transformation in Banking
Document4 pages
Digital Transformation in Banking
Sharlee Jain
No ratings yet
Introduction To Keras
Document14 pages
Introduction To Keras
Rosina Ahiave
No ratings yet
Individual Assignment 2
Document4 pages
Individual Assignment 2
jemal yahyaa
No ratings yet
MLP2021 22 cw1-5
Document1 page
MLP2021 22 cw1-5
Zhaokai Wang
No ratings yet
Aman Agarwal
Document6 pages
Aman Agarwal
Aman Bansal
No ratings yet
Academic Analytics Model - Weka Flow
Document3 pages
Academic Analytics Model - Weka Flow
Madalina Beret
No ratings yet
Pipelines
Document17 pages
Pipelines
vgokuul007
No ratings yet
AI and ML Lab Manual
Document29 pages
AI and ML Lab Manual
Nithya Nair
No ratings yet
Operationalizing The Model
Document46 pages
Operationalizing The Model
Mohamed Rahal
No ratings yet
Simple Linear Regression - Assignn5
Document8 pages
Simple Linear Regression - Assignn5
Sravani Adapa
No ratings yet
CSL465/603 - Machine Learning
Document3 pages
CSL465/603 - Machine Learning
Aakarshan Gupta
No ratings yet
18CSL76 Artificial Intelligence and Machine Learning Laboratory syllabus for CS
Document1 page
18CSL76 Artificial Intelligence and Machine Learning Laboratory syllabus for CS
Sanjay Kumar
No ratings yet
VTU Machine Learning Lab Manual - Implement 10 Algos
Document43 pages
VTU Machine Learning Lab Manual - Implement 10 Algos
vijay1985jan09
No ratings yet
Data Science Chapitre 1
Document54 pages
Data Science Chapitre 1
Leonel Ska
No ratings yet
Bayesian Decision Theory Pokémon Type Prediction
Document6 pages
Bayesian Decision Theory Pokémon Type Prediction
raosaheb
No ratings yet
Lab I TENSOR FLOW AND KERAS
Document3 pages
Lab I TENSOR FLOW AND KERAS
Tirth G
No ratings yet
GyandeepSarmah
Document5 pages
GyandeepSarmah
Aman Bansal
No ratings yet
ML Lab 09 Manual - Introduction To Scikit Learn
Document6 pages
ML Lab 09 Manual - Introduction To Scikit Learn
ALI HAIDER
No ratings yet
AI Phase4
Document11 pages
AI Phase4
techusama4
No ratings yet
Exercise 07
Document5 pages
Exercise 07
Doublev Omer
No ratings yet
SVM, Naive Bayes, Boosting Models
Document3 pages
SVM, Naive Bayes, Boosting Models
thecoolguy96
No ratings yet
COMP-377.lab2
Document3 pages
COMP-377.lab2
Rich 1st
No ratings yet
DL - Assignment 1
Document12 pages
DL - Assignment 1
msds21024
No ratings yet
HEHEH
Document2 pages
HEHEH
Syed Wyle Mustafa
No ratings yet
Machine Learning Pipelines
From Everand
Machine Learning Pipelines
Chuck Sherman
No ratings yet
LakshmiY 12110035 Solutions
Document10 pages
LakshmiY 12110035 Solutions
Lakshmi Harshitha Yechuri
No ratings yet
Linear Programming: Applications: The Custom Molder's Problem
Document4 pages
Linear Programming: Applications: The Custom Molder's Problem
Lakshmi Harshitha Yechuri
No ratings yet
User's Profile Ontology-Based Semantic Framework For Personalized Food and Nutrition Recommendation
Document8 pages
User's Profile Ontology-Based Semantic Framework For Personalized Food and Nutrition Recommendation
Lakshmi Harshitha Yechuri
No ratings yet
TelecomOptics Example
Document1 page
TelecomOptics Example
Lakshmi Harshitha Yechuri
No ratings yet
Step 1: Install Required Libraries
Document3 pages
Step 1: Install Required Libraries
Lakshmi Harshitha Yechuri
No ratings yet
User's Profile Ontology-Based Semantic Framework For Personalized Food and Nutrition Recommendation
Document8 pages
User's Profile Ontology-Based Semantic Framework For Personalized Food and Nutrition Recommendation
Lakshmi Harshitha Yechuri
No ratings yet
B16 Optimization Group Assignment Instructions
Document1 page
B16 Optimization Group Assignment Instructions
Lakshmi Harshitha Yechuri
No ratings yet
Homework: Linear and Integer Optimization: 1.1 Restaurant Stang
Document4 pages
Homework: Linear and Integer Optimization: 1.1 Restaurant Stang
Poojitha Pooji
No ratings yet
B16 Optimization Group Assignment Instructions
Document1 page
B16 Optimization Group Assignment Instructions
Lakshmi Harshitha Yechuri
No ratings yet
Homework: Linear and Integer Optimization: 1.1 Restaurant Stang
Document4 pages
Homework: Linear and Integer Optimization: 1.1 Restaurant Stang
Poojitha Pooji
No ratings yet
I When Autocratic Leaders Become An Option-Uncertainty and Self-Esteem Predict Implicit Leadership Preferences
Document5 pages
I When Autocratic Leaders Become An Option-Uncertainty and Self-Esteem Predict Implicit Leadership Preferences
Lakshmi Harshitha Yechuri
No ratings yet
Global Economic Analysis
Document27 pages
Global Economic Analysis
Lakshmi Harshitha Yechuri
No ratings yet
Solution Day4 Exercises
Document1 page
Solution Day4 Exercises
Lakshmi Harshitha Yechuri
No ratings yet
TelecomOptics Example
Document1 page
TelecomOptics Example
Lakshmi Harshitha Yechuri
No ratings yet
Solution Day3 Exercises
Document1 page
Solution Day3 Exercises
Lakshmi Harshitha Yechuri
No ratings yet
Revision Question Day 4
Document1 page
Revision Question Day 4
Lakshmi Harshitha Yechuri
No ratings yet
Revision Question Day 3
Document1 page
Revision Question Day 3
Lakshmi Harshitha Yechuri
No ratings yet
Macroeconomics Guide to GDP, GNP & Economic Measures
Document40 pages
Macroeconomics Guide to GDP, GNP & Economic Measures
Lakshmi Harshitha Yechuri
No ratings yet
Revision Question Day 2
Document1 page
Revision Question Day 2
Lakshmi Harshitha Yechuri
No ratings yet
Microeconomics
Document43 pages
Microeconomics
Lakshmi Harshitha Yechuri
No ratings yet
Prof. Massimo Guidolin: 20192 - Financial Econometrics
Document12 pages
Prof. Massimo Guidolin: 20192 - Financial Econometrics
Dewi Setyawati Putri
No ratings yet
Inferences about Linear Regression: Sample Statistics Confidence Interval for Slope, β1
Document3 pages
Inferences about Linear Regression: Sample Statistics Confidence Interval for Slope, β1
utsav_koshti
No ratings yet
ARIMA Procedure Ebook
Document110 pages
ARIMA Procedure Ebook
Shiera Mae Labial Lange
No ratings yet
R Code For Canonical Correlation Analysis
Document10 pages
R Code For Canonical Correlation Analysis
Jose Luis Jurado Zurita
No ratings yet
A Short Monograph On Analysis of Variance (ANOVA) : T R Pgp-Dsba
Document44 pages
A Short Monograph On Analysis of Variance (ANOVA) : T R Pgp-Dsba
rajat
No ratings yet
MA Econometrics II: Midterm 2019: Instructor: Bipasha Maity Ashoka University Spring Semester 2019 March 13, 2019
Document2 pages
MA Econometrics II: Midterm 2019: Instructor: Bipasha Maity Ashoka University Spring Semester 2019 March 13, 2019
akshay patri
No ratings yet
Econometric Modelling Overview
Document14 pages
Econometric Modelling Overview
Bharathithasan Saminathan
No ratings yet
Cp-plots calibration for linear regression model selection
Document16 pages
Cp-plots calibration for linear regression model selection
Alex Malard
No ratings yet
Linear Regression Hands-On
Document27 pages
Linear Regression Hands-On
Nishant Randev
No ratings yet
Polychoric Correlation
Document2 pages
Polychoric Correlation
dev414
No ratings yet
Machine Learning: An Applied Econometric Approach: Sendhil Mullainathan and Jann Spiess
Document38 pages
Machine Learning: An Applied Econometric Approach: Sendhil Mullainathan and Jann Spiess
abdul salam
No ratings yet
EC220 2017 Paper ST
Document5 pages
EC220 2017 Paper ST
bekfjbewkfbwke
No ratings yet
Factors Affecting Household Food Security
Document10 pages
Factors Affecting Household Food Security
Abbi
No ratings yet
(2013) Package Relaimpo PDF
Document36 pages
(2013) Package Relaimpo PDF
Lyly Magnan
No ratings yet
GLM Multivariate CC P PDF
Document32 pages
GLM Multivariate CC P PDF
Munirul Ula
No ratings yet
Correlation
Document17 pages
Correlation
Sara Gomez
No ratings yet
Index Introductory Econometrics For Finance
Document7 pages
Index Introductory Econometrics For Finance
Alim Ahmad
No ratings yet
Exercise 1
Document6 pages
Exercise 1
EVA MONSERRAT
No ratings yet
Econometrics Assumptions of the Classical Regression Model
Document15 pages
Econometrics Assumptions of the Classical Regression Model
Abigail V
No ratings yet
Anova - Stats Quiz - Answer
Document4 pages
Anova - Stats Quiz - Answer
Iaiaiai
No ratings yet
Solutions to Practice Problems for Part VI
Document40 pages
Solutions to Practice Problems for Part VI
shannon casey
No ratings yet
Agr3701 - Exercise 6 - Experimental Design - RCBD
Document5 pages
Agr3701 - Exercise 6 - Experimental Design - RCBD
iqah
No ratings yet
Credit Card Balance Analysis Reveals Key Influencing Factors
Document15 pages
Credit Card Balance Analysis Reveals Key Influencing Factors
subburaj
25% (4)
Linear Regression and Anova
Document11 pages
Linear Regression and Anova
Deep Narayan
No ratings yet
RIVERS STATE UNIVERSITY EXAMINATION
Document2 pages
RIVERS STATE UNIVERSITY EXAMINATION
Jeff Emi
No ratings yet
5G Resources Allocation Machine Learning Project ???
Document37 pages
5G Resources Allocation Machine Learning Project ???
Ravi
No ratings yet
One Way ANOVA, Two Way ANOVA and Interaction ANOVA
Document25 pages
One Way ANOVA, Two Way ANOVA and Interaction ANOVA
Shil Shambharkar
No ratings yet
Skittles Project Part 3
Document3 pages
Skittles Project Part 3
api-242372492
No ratings yet
Jurnal Gentiaras Manajemen dan Akuntasi Vol 12 (2) 125 - 137
Document13 pages
Jurnal Gentiaras Manajemen dan Akuntasi Vol 12 (2) 125 - 137
Nining Nursifah055
No ratings yet
Operations Management Forecasting Techniques
Document35 pages
Operations Management Forecasting Techniques
UN-HABITAT Nepal
No ratings yet