Assign2 Writeup

Uploaded by

Shreeya Ganji

0% found this document useful (0 votes)

6 views1 page

Original Title

112103042_Assign2_writeup

Copyright

Available Formats

TXT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

6 views1 page

Assign2 Writeup

Uploaded by

Shreeya Ganji

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 1

Search inside document

WRITE-UP FOR ASSIGNMENT

The first step was to ensure that the text is in English. I used the langdetect
library to detect the language of each text instance. If the language was not
English, the text was discarded.

I implemented patterns to identify and remove code snippets and common boilerplate
text often found in web content.

I performed various cleaning operations such as removing extra whitespaces,

normalizing punctuation marks, handling contractions, and standardizing date and
time formats.

The cleaned text was tokenized using the NLTK word_tokenize function. This step
split the text into individual words or tokens.

Stopwords, which are common words that do not carry significant meaning, were
removed using NLTK's stopwords corpus.

I applied both stemming and lemmatization to further normalize the tokens. Stemming
reduces words to their root form, while lemmatization converts words to their base
or dictionary form.
After preprocessing, I reconstructed the sentences from the normalized tokens for
further analysis.

During the incremental development of the code, I conducted several experiments to

refine the preprocessing steps and improve the quality of the tokenized data. Some
of the experiments included:

I experimented with different parameters for language detection, cleaning

operations, and tokenization to achieve better results.

I iteratively updated the code to better identify and remove code snippets,
boilerplate text, and URLs.

I tested various date and time formats and handled exceptions to ensure accurate
parsing and normalization.

I explored the option of customizing the list of stopwords based on the specific
domain or context of the text data.

I evaluated the performance of the tokenizer and normalizer by analyzing the

quality of the output tokens and their impact on downstream tasks such as text
classification or clustering.

Final LP-VI NLP Manual 2023-24
Document29 pages
Final LP-VI NLP Manual 2023-24
shreyasnagare3635
No ratings yet
Dsbdal A7
Document65 pages
Dsbdal A7
airprojectjnv2020
No ratings yet
Text Processing - Take Raw Input Text, Clean It,: The NLP Pipeline
Document6 pages
Text Processing - Take Raw Input Text, Clean It,: The NLP Pipeline
Allan Robey
No ratings yet
Unraveling The Power of Natural Language Processing
Document11 pages
Unraveling The Power of Natural Language Processing
suranifaizan52
No ratings yet
NLP TT-1 Question Bank
Document21 pages
NLP TT-1 Question Bank
Abhishek Tiwari
No ratings yet
Chapter-1 Introduction To NLP
Document12 pages
Chapter-1 Introduction To NLP
Sruja Koshti
No ratings yet
Natural Language Processing: Practical 1
Document64 pages
Natural Language Processing: Practical 1
hamza
No ratings yet
Introduction To Natural Language Processing and NLTK
Document23 pages
Introduction To Natural Language Processing and NLTK
Nikhil Saini
No ratings yet
Ass7 Write Up .Final
Document11 pages
Ass7 Write Up .Final
adagalepayale023
No ratings yet
NLP Lect-5 02.02.21
Document18 pages
NLP Lect-5 02.02.21
Dnyanesh Bavkar
No ratings yet
NLP Lect-6 03.02.21
Document17 pages
NLP Lect-6 03.02.21
Dnyanesh Bavkar
No ratings yet
NLP - Short Assignments
Document8 pages
NLP - Short Assignments
wemela1891
No ratings yet
Assignment 1 IR
Document4 pages
Assignment 1 IR
Pac SaQii
No ratings yet
NLP Manual (1-12)
Document54 pages
NLP Manual (1-12)
sj120cp
No ratings yet
NLP Manual (1-12)
Document55 pages
NLP Manual (1-12)
sj120cp
No ratings yet
LP Vi Manual
Document77 pages
LP Vi Manual
Jahan Chaware
No ratings yet
Fundamental Text Analysis Pipeline and Recognizing Textual Entailment
Document5 pages
Fundamental Text Analysis Pipeline and Recognizing Textual Entailment
Sunanda Bansal
No ratings yet
09 - OpenNLP PDF
Document32 pages
09 - OpenNLP PDF
Mandadapu Swathi
No ratings yet
Assignment 3 BIM IR
Document5 pages
Assignment 3 BIM IR
Pac SaQii
No ratings yet
NLP Manual (1-12) 1
Document56 pages
NLP Manual (1-12) 1
sj120cp
No ratings yet
Input
Document1 page
Input
Sri
No ratings yet
Lab1 IR
Document14 pages
Lab1 IR
Pac SaQii
No ratings yet
Tokenization in NLP
Document10 pages
Tokenization in NLP
Bhumika Biyani
No ratings yet
Week 6: Introduction To Natural Language Processing
Document18 pages
Week 6: Introduction To Natural Language Processing
Dimpu Shah
No ratings yet
Untitled
Document16 pages
Untitled
Mohammed Ali
No ratings yet
NLP Lab Manual
Document16 pages
NLP Lab Manual
adarsh24jdp
No ratings yet
Getting Started On Natural Language Processing With Python: Crossroads September 2007
Document17 pages
Getting Started On Natural Language Processing With Python: Crossroads September 2007
Harshit Gupta
No ratings yet
ANLP semVI Labmanual
Document33 pages
ANLP semVI Labmanual
kun.dha.rt22
No ratings yet
By Om Joshi, Varun Venkatesh, Sid Anand, Vienna Li: Enhancing The Way Students Take Notes
Document12 pages
By Om Joshi, Varun Venkatesh, Sid Anand, Vienna Li: Enhancing The Way Students Take Notes
Om Joshi
No ratings yet
Deep Learning in Practice Project Two: NLP of The Holy Quran in Python
Document11 pages
Deep Learning in Practice Project Two: NLP of The Holy Quran in Python
shoaib riaz
No ratings yet
Week 7 Introduction
Document2 pages
Week 7 Introduction
Shivam Yadav
No ratings yet
NLP Lab Manual-1
Document18 pages
NLP Lab Manual-1
kalanadhamganapathipavankumar
No ratings yet
Assignment 2 IR
Document6 pages
Assignment 2 IR
Pac SaQii
No ratings yet
Lab3 IR BIM
Document14 pages
Lab3 IR BIM
Pac SaQii
No ratings yet
Unit 1
Document4 pages
Unit 1
Shiv M
No ratings yet
Text Mining
Document31 pages
Text Mining
Anonymous sETEf2rtz
No ratings yet
Seminar On Natural Language Processing
Document21 pages
Seminar On Natural Language Processing
Aman Bajaj
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
Document37 pages
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
Zander Catta Preta
No ratings yet
Chapter - 1: Existing System
Document15 pages
Chapter - 1: Existing System
Bavithraa
No ratings yet
NLP Part1
Document67 pages
NLP Part1
QADEER AHMAD
No ratings yet
04 StemminginNLP
Document10 pages
04 StemminginNLP
Bhumika Biyani
No ratings yet
Experiment - 2
Document3 pages
Experiment - 2
dscientist796
No ratings yet
Weekly Progress Report: 1 Full Name
Document7 pages
Weekly Progress Report: 1 Full Name
a_damrong
No ratings yet
Session 11-12 - Text Analytics
Document38 pages
Session 11-12 - Text Analytics
Shishir Gupta
No ratings yet
Spacy Library
Document3 pages
Spacy Library
sanjay roka
No ratings yet
Informativeness-Based Keyword Extraction From Short Documents
Document11 pages
Informativeness-Based Keyword Extraction From Short Documents
Bekuma Gudina
No ratings yet
Computer Science 4022:pp. 898-907.: Deposited On: 5 November 2007
Document11 pages
Computer Science 4022:pp. 898-907.: Deposited On: 5 November 2007
xitret
No ratings yet
Progress Report
Document3 pages
Progress Report
pratik kumar
No ratings yet
Group A Assignment No: 7
Document10 pages
Group A Assignment No: 7
Shubham Dhanne
No ratings yet
Chapter 15 - MINING MEANING FROM TEXT
Document20 pages
Chapter 15 - MINING MEANING FROM TEXT
Simer Fibers
No ratings yet
NLP Notes and Related Questions
Document7 pages
NLP Notes and Related Questions
Pranjal Kapkar
No ratings yet
Assignment No. 1: Name: Omkar Joshi Roll No: BE20S05F004 Sub: SPCC
Document6 pages
Assignment No. 1: Name: Omkar Joshi Roll No: BE20S05F004 Sub: SPCC
TECH TALKS
No ratings yet
Case Study On The Building
Document15 pages
Case Study On The Building
utkarshgandhi6543
No ratings yet
A Proposed Automated Extraction Procedure of Bangla Text For Corpus Creation in Unicode
Document5 pages
A Proposed Automated Extraction Procedure of Bangla Text For Corpus Creation in Unicode
mohammedfereje sulieman
No ratings yet
Multilingual Information Retrieval
Document18 pages
Multilingual Information Retrieval
Harshitha
No ratings yet
U 7
Document9 pages
U 7
eryque7
No ratings yet
Integrated Language Testing
Document47 pages
Integrated Language Testing
kashishchak
No ratings yet
Generation of Formal and Informal Sentences: September 2011
Document8 pages
Generation of Formal and Informal Sentences: September 2011
Abeer Abdelrhman
No ratings yet
IP Projects NLP
Document8 pages
IP Projects NLP
vaibhavsiwach1
No ratings yet
Real-World Natural Language Processing: Practical applications with deep learning
From Everand
Real-World Natural Language Processing: Practical applications with deep learning
Masato Hagiwara
No ratings yet
19 April 2023 31 March 2023 9 March 2023 16 Feb 2023 2 Feb 2023 16 Jan 2023 17 March 2023 24 Feb 2023 20 April 2023
Document1 page
19 April 2023 31 March 2023 9 March 2023 16 Feb 2023 2 Feb 2023 16 Jan 2023 17 March 2023 24 Feb 2023 20 April 2023
Shreeya Ganji
No ratings yet
Assignment 1
Document2 pages
Assignment 1
Shreeya Ganji
No ratings yet
U6 Mac
Document69 pages
U6 Mac
Shreeya Ganji
No ratings yet
Quiz 1
Document27 pages
Quiz 1
Shreeya Ganji
No ratings yet
Sy Compdiv1 Genes - t1
Document9 pages
Sy Compdiv1 Genes - t1
Shreeya Ganji
No ratings yet
T2 Question Paper
Document2 pages
T2 Question Paper
Shreeya Ganji
No ratings yet
Unit 2 3 PLC Counters Students Notes
Document14 pages
Unit 2 3 PLC Counters Students Notes
Shreeya Ganji
No ratings yet
Tutorial On Graphs and Trees
Document3 pages
Tutorial On Graphs and Trees
Shreeya Ganji
No ratings yet
Predicates & Quantifiers: Universal and Existential
Document13 pages
Predicates & Quantifiers: Universal and Existential
Shreeya Ganji
No ratings yet
Vector Functions & Scalar Functions
Document7 pages
Vector Functions & Scalar Functions
Shreeya Ganji
No ratings yet
Implement ADT Binary Search Tree Using Array and Write Non-Recursive Function For 1. Height of The Tree
Document1 page
Implement ADT Binary Search Tree Using Array and Write Non-Recursive Function For 1. Height of The Tree
Shreeya Ganji
No ratings yet
Threads: Abhijit A M
Document40 pages
Threads: Abhijit A M
Shreeya Ganji
No ratings yet
Scheduler
Document13 pages
Scheduler
Shreeya Ganji
No ratings yet