Welcome to Scribd!

P (D - C) P (C) : C Argmax C C

Uploaded by

0% found this document useful (0 votes)

16 views4 pages

Naive Bayes text classification works by determining the most likely class (maximum posteriori or MAP class) for a given document based on applying Bayes' rule and calculating the conditional probability P(d|c) of the document given each class. The shape function returns the number of dimensions and number of elements in each dimension of an array, such as the number of documents and vocabulary size. CountVectorizer converts documents to term count vectors, with get_feature_names() providing tokens and toarray() giving vector representations showing term counts per document. TfidfTransformer computes term frequency-inverse document frequency (TF-IDF) weights, which are used to scale term weights inversely with their frequency in documents.

Original Description:

Original Title

ans

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

16 views4 pages

P (D - C) P (C) : C Argmax C C

Uploaded by

amin kaiser

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 4

Search inside document

Q1. Explain the working principle of a Naïve Bayes Text Classifier.

Ans: Naive Bayes model is easy to build and particularly useful for very large data
sets. In text classification, our goal is to find the best class for the document. The
best class in NB classification is the most likely or maximum posteriori (MAP) class
Cmap.
Where Bayes’s rule is applied and we drop the denominator in last step because
P(d) is the same for all classes and doesn’t affect the argmax.

c map = argmax P(d|c)P(c)

cεC
conditional distribution P(d|c):
P(d|c) = P((t i,…., tk,……,tna )|c)

Where (t1,……,tk,…tnd) is the sequence of terms as it occurs in d

Q2. Consider the text classification example of China class. Why the
probabilities found in the pen-and-paper computation differ from the
probability generated by the scikit-learn MultinomialNB classifier.
Ans:
P(c|d5) ∝ 3/4 . (3/7)3 .1/14 . 1/14 ≈ 0.0003.

P(7|d5) ∝ 1/4 . (2/9)3 . 2/9 . 2/9 ≈ 0.0001.

The probability that we have analyzed utilizing multinomial classification does not
match in pen and paper.
But we know the law of probability is the probability of an event happening and
not happening is equal to 1.
Here, it gives the result concurring to this equation after more calculation.
Q3. What does Python shape function provide in general? What is it giving in our
text classification example? Explain with the statements containing
X_train_counts.shape, X_train_tf.shape, and X_train_tfidf.shape. Why all of
them showing the same values?
Ans: The function "shape" returns the shape of an array. The shape is a tuple of
integers. These numbers denote the lengths of the corresponding array dimension.
Shape function giving – number of dimensions and number elements in each
dimension.
Here, X_train_counts.shape =(4, 6)
X_train_tf.shape = (4, 6)
X_train_tf.shape = (4, 6)
number of documents = 4 and vocabulary size = 6 of each document.
That’s why all of them giving the same values

Q4. Read the scikit-learn manual for understanding. Explain how the
CountVectorizer() method works. Use the results of
count_vect.get_feature_names(), X_train_counts.toarray(), and X_train_counts in
your explanation.

Ans: Scikit-learn’s CountVectorizer is used to convert a collection of text documents

to a vector of term/token counts. It also enables the pre-processing of text data
prior to generating the vector representation. This functionality makes it a highly
flexible feature representation module for text.
count_vect.get_feature_names() is providing all the tokens. They are -
'beijing'
'chinese'
'japan'
'macao'
'shanghai'
'tokyo'
X_train_counts.toarray() giving the vector representation.
010100
011001
020010
120000
X_train_counts is giving output :
(0, 1) 1 - In document 0, term 1(chinese) is containing for 1 time.
(0, 3) 1 - In document 0, term 3(macao) is containing for 1 time.
(1, 1) 1 - In document 1, term 1(chinese) is containing for 1 time.
(1, 5) 1 - In document 1, term 5(tokyo) is containing for 1 time.
(1, 2) 1 - In document 1, term 2(japan) is containing for 1 time.
(2, 1) 2 - In document 2, term 1(chinese) is containing for 1 time.
(2, 4) 1 - In document 2, term 4(shanghai) is containing for 1 time.
(3, 1) 2 - In document 3, term 1(chinese) is containing for 1 time.
(3, 0) 1 - In document 3, term 0(beijing) is containing for 1 time.

Q5. Explain how the TfidfTransformer() works. Show the computation how the TF
and TFiDF results come up in the output of X_train_tf.toarray(), and
X_train_tfidf.toarray().

Ans:

String Algorithms in C: Efficient Text Representation and Search
From Everand
String Algorithms in C: Efficient Text Representation and Search
Thomas Mailund
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Cs 229, Spring 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
Document8 pages
Cs 229, Spring 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
Achuthan Sekar
No ratings yet
Mat C Manual
Document8 pages
Mat C Manual
Martin Caicedo
No ratings yet
Unit III
Document28 pages
Unit III
Omar Farooque
No ratings yet
Mat C Manual
Document8 pages
Mat C Manual
Bi8ikity
No ratings yet
RNN Notes
Document36 pages
RNN Notes
Jatin Dhingra
No ratings yet
HW 3
Document5 pages
HW 3
Abbas
No ratings yet
Structs Impl PRG Lan
Document11 pages
Structs Impl PRG Lan
Google Doc
No ratings yet
CS246 Hw1
Document5 pages
CS246 Hw1
sudh123456
No ratings yet
CPSC 540 Assignment 1 (Due January 19)
Document9 pages
CPSC 540 Assignment 1 (Due January 19)
JohnnyDoe0x27A
No ratings yet
Differential Equations (Aggregate) Models With MATLAB and Octave A Predator-Prey Example
Document9 pages
Differential Equations (Aggregate) Models With MATLAB and Octave A Predator-Prey Example
vankrist
No ratings yet
Mathematics 4330/5344 - # 3 Loops, Conditionals, Examples and Programming
Document11 pages
Mathematics 4330/5344 - # 3 Loops, Conditionals, Examples and Programming
Althara Baldago
No ratings yet
Programming and Data Structures Using Python Final Exam, Dec 2020-Mar 2021
Document3 pages
Programming and Data Structures Using Python Final Exam, Dec 2020-Mar 2021
Ishan Singh
No ratings yet
20MCA023 Algorithm Assighnment
Document13 pages
20MCA023 Algorithm Assighnment
Pratik Kakani
No ratings yet
Machine Learning: E0270 2015 Assignment 4: Due March 24 Before Class
Document3 pages
Machine Learning: E0270 2015 Assignment 4: Due March 24 Before Class
Mahesh Yada
No ratings yet
Greedy
Document22 pages
Greedy
larymarklary
No ratings yet
Next Word Prediction With NLP and Deep Learning
Document13 pages
Next Word Prediction With NLP and Deep Learning
Alebachew Mekuriaw
No ratings yet
Problem Set 4
Document4 pages
Problem Set 4
Yash Varun
No ratings yet
Computer Science Notes: Algorithms and Data Structures
Document21 pages
Computer Science Notes: Algorithms and Data Structures
Muhammed
No ratings yet
LabExercise 1 - Familiarization With MATLAB
Document19 pages
LabExercise 1 - Familiarization With MATLAB
Queenie Rose Percil
No ratings yet
DAA Question Bank
Document10 pages
DAA Question Bank
AVANTHIKA
No ratings yet
Ex 3
Document12 pages
Ex 3
iqfzfbzqfzfq
No ratings yet
Lab 2
Document14 pages
Lab 2
Tahsin Zaman Talha
No ratings yet
Algorithm Analysis: 3.1 Abstracting Costs
Document19 pages
Algorithm Analysis: 3.1 Abstracting Costs
Vinay Mishra
No ratings yet
Assignment - 4 OF Advanced Data Structutre: Submitted To: Submitted by
Document7 pages
Assignment - 4 OF Advanced Data Structutre: Submitted To: Submitted by
Monika Dogra
No ratings yet
Chap 3 TensorFlow
Document24 pages
Chap 3 TensorFlow
HRITWIK GHOSH
No ratings yet
Machine Learning Theory (CS351) Report Text Classification Using TF-MONO Weighting Scheme
Document16 pages
Machine Learning Theory (CS351) Report Text Classification Using TF-MONO Weighting Scheme
Ameya Deshpande.
No ratings yet
Python Basics Nympy
Document5 pages
Python Basics Nympy
ikram
No ratings yet
Maximum Likelihood Estimation by R: Instructor: Songfeng Zheng
Document5 pages
Maximum Likelihood Estimation by R: Instructor: Songfeng Zheng
ayush
No ratings yet
Greedy Algorithms
Document42 pages
Greedy Algorithms
reyann
No ratings yet
CS240A: Databases and Knowledge Bases
Document20 pages
CS240A: Databases and Knowledge Bases
JohnZ
No ratings yet
7 Programming in R: 7.1 Functions
Document6 pages
7 Programming in R: 7.1 Functions
caliterra
No ratings yet
Better External Memory Suffix Array Construction: Roman Dementiev, Juha K Arkk Ainen, Jens Mehnert, Peter Sanders
Document12 pages
Better External Memory Suffix Array Construction: Roman Dementiev, Juha K Arkk Ainen, Jens Mehnert, Peter Sanders
manuelq9
No ratings yet
Evaluation of Different Classifier
Document4 pages
Evaluation of Different Classifier
Tiwari Vivek
No ratings yet
MBAn Technical Interview Skills Training Workshop Part 1 With Jack Dunn Daisy Zhou Slides PDF
Document82 pages
MBAn Technical Interview Skills Training Workshop Part 1 With Jack Dunn Daisy Zhou Slides PDF
Christine Cao
No ratings yet
The Matlab ... : Overview
Document6 pages
The Matlab ... : Overview
Usama Javed
No ratings yet
Python NSR Notes
Document148 pages
Python NSR Notes
Shrirang Patil
No ratings yet
Ps 4
Document12 pages
Ps 4
Wassim Azzabi
No ratings yet
ADA - Question - Bank
Document6 pages
ADA - Question - Bank
ananyanalawade2004
No ratings yet
Matlab Assignment-01 SEM-II-2016-2017 PDF
Document5 pages
Matlab Assignment-01 SEM-II-2016-2017 PDF
farhanfendi
No ratings yet
Y 1999 PAPER1
Document7 pages
Y 1999 PAPER1
hussainfizam
No ratings yet
Lab 11 (Last Lab) - Lab 10 and Crib Sheet Are Attached To This Document Problem 11-1
Document5 pages
Lab 11 (Last Lab) - Lab 10 and Crib Sheet Are Attached To This Document Problem 11-1
Massie Arueyingho
No ratings yet
New Ranking Algorithms For Parsing and Tagging: Kernels Over Discrete Structures, and The Voted Perceptron
Document8 pages
New Ranking Algorithms For Parsing and Tagging: Kernels Over Discrete Structures, and The Voted Perceptron
Meher Vijay
No ratings yet
Tutorial Sheet of Computer Algorithms (BCA)
Document3 pages
Tutorial Sheet of Computer Algorithms (BCA)
xiayo
No ratings yet
Algos Cs Cmu Edu
Document15 pages
Algos Cs Cmu Edu
Ravi Chandra Reddy Muli
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
Document12 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
Shubham Anand
No ratings yet
Math10282 Ex03 - An R Session
Document10 pages
Math10282 Ex03 - An R Session
deimante
No ratings yet
Chapter 1:-Basics of Algorithms and Mathematics
Document3 pages
Chapter 1:-Basics of Algorithms and Mathematics
Richa Bhavsar
No ratings yet
CS5785 Homework 4: .PDF .Py .Ipynb
Document5 pages
CS5785 Homework 4: .PDF .Py .Ipynb
Al Tarino
No ratings yet
St. Xavier's College: Python Language Lab
Document13 pages
St. Xavier's College: Python Language Lab
ARPAN LAHA
No ratings yet
ITCE436 Lab3 07
Document9 pages
ITCE436 Lab3 07
Noor Ahmed
No ratings yet
Programming in Matlab
Document52 pages
Programming in Matlab
Agnibha Banerjee
No ratings yet
Technological University of The Philippines: Manila
Document19 pages
Technological University of The Philippines: Manila
Lea Santos
No ratings yet
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
Document5 pages
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
Muhammad Hur Rizvi
No ratings yet
1.1 Arrangements: C C F F C C
Document18 pages
1.1 Arrangements: C C F F C C
Ahmed Salim
No ratings yet
ML Andrew NG
Document13 pages
ML Andrew NG
Guru Prasad
No ratings yet
Programming Exercise 2: Logistic Regression: Machine Learning
Document13 pages
Programming Exercise 2: Logistic Regression: Machine Learning
zonados
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Cantelang Vs Medina 1979 PDF
Document5 pages
Cantelang Vs Medina 1979 PDF
Mark Christian B. Apordo
No ratings yet
Industrial Relations Class
Document77 pages
Industrial Relations Class
Kapil Kumar
67% (3)
Mcasyl Final 2015 16 Code Final
Document113 pages
Mcasyl Final 2015 16 Code Final
Shivani Kothari
No ratings yet
Nutrition Presentation 2021-1
Document14 pages
Nutrition Presentation 2021-1
Paljesza may Pamate
No ratings yet
PARKER COUNTY - Poolville ISD - 2005 Texas School Survey of Drug and Alcohol Use
Document10 pages
PARKER COUNTY - Poolville ISD - 2005 Texas School Survey of Drug and Alcohol Use
Texas School Survey of Drug and Alcohol Use
No ratings yet
Technical Report
Document2 pages
Technical Report
siham
No ratings yet
Dela Cruz VS CA
Document8 pages
Dela Cruz VS CA
Monique Lhuillier
No ratings yet
HRD Multiple Choice Question
Document29 pages
HRD Multiple Choice Question
Hayat Tarrar
No ratings yet
Rift Valley University Accounting Department Faculty of Business and Economics
Document16 pages
Rift Valley University Accounting Department Faculty of Business and Economics
Yitagesu Ayalew
No ratings yet
Business Objectives Workbook by Vicki Hollett and Michael Duckworth
Document81 pages
Business Objectives Workbook by Vicki Hollett and Michael Duckworth
Ali Morina
93% (14)
Material For BTS3900&BTS5900 Node Performance Counter Changes (V100R015C10SPC080 Vs V100R015C10)
Document14 pages
Material For BTS3900&BTS5900 Node Performance Counter Changes (V100R015C10SPC080 Vs V100R015C10)
klajdi
No ratings yet
ContentServer PDF
Document10 pages
ContentServer PDF
Silvia Patricia Rodríguez Peña 2171033
No ratings yet
Survivors Poem
Document4 pages
Survivors Poem
Monosyndrome
No ratings yet
Showing The Different Contributions On The Understanding of Earth System (Geosphere)
Document47 pages
Showing The Different Contributions On The Understanding of Earth System (Geosphere)
Jennelle Abueme
No ratings yet
Bollinger Band Manual - Mark Deaton
Document31 pages
Bollinger Band Manual - Mark Deaton
Yagnesh Patel
100% (2)
Discipulado Alan Hirsch
Document39 pages
Discipulado Alan Hirsch
lucas mota
No ratings yet
Wendy Brown - in The Ruins of Neoliberalism. The Rise of Antidemocratic Politics in The West
Document36 pages
Wendy Brown - in The Ruins of Neoliberalism. The Rise of Antidemocratic Politics in The West
Marcelo
100% (1)
Inglés B2 Cto Septiembre 2020 (Final) - Corrector
Document8 pages
Inglés B2 Cto Septiembre 2020 (Final) - Corrector
Jane
No ratings yet
0004.22-Erma Modes of Operation
Document4 pages
0004.22-Erma Modes of Operation
Drug Luka Popovic
No ratings yet
Julius Klaproth: His Life and Works With Special Emphasis On Japan Hartmut Walravens, Berlin
Document15 pages
Julius Klaproth: His Life and Works With Special Emphasis On Japan Hartmut Walravens, Berlin
Sarbu Ana
100% (1)
Secured Transaction
Document89 pages
Secured Transaction
Jay Telan
No ratings yet
Test For Nutrients in Foods
Document2 pages
Test For Nutrients in Foods
Adrian Alvinson Nazareno
No ratings yet
Hanoi Open University PDF
Document18 pages
Hanoi Open University PDF
Linh Chu
No ratings yet
Songs of The Vaishnava Acharyas
Document110 pages
Songs of The Vaishnava Acharyas
Murari Dasa
100% (1)
Exercise 1.3: Predicate and Quantifiers
Document17 pages
Exercise 1.3: Predicate and Quantifiers
Baby baloch
No ratings yet
Internship Report On National Bank of Pakistan City Abbottabad (0591)
Document10 pages
Internship Report On National Bank of Pakistan City Abbottabad (0591)
Arbab Khan
No ratings yet
ANSA McAL Annual Report 2016
Document119 pages
ANSA McAL Annual Report 2016
Thias Gosine
0% (1)
Manonmaniam Sundaranar University Tirunelveli-12 FOR THE YEAR 2016 - 2017
Document10 pages
Manonmaniam Sundaranar University Tirunelveli-12 FOR THE YEAR 2016 - 2017
David Miller
No ratings yet
Malwa in Transition The First
Document423 pages
Malwa in Transition The First
Bahu Virupaksha
No ratings yet
Volkswagen: BY Sai Sarath
Document10 pages
Volkswagen: BY Sai Sarath
Sarath Yarramalli
No ratings yet