NPL Assignment 1

Ques 1 – Explain the significance of NLP in today’s technology-driven world.
Ans - In today's technologically advanced world, natural language processing (NLP) is extremely
important for a number of reasons.
1. Human-Computer Interaction : Natural Language Processing (NLP) enables more intuitive

and seamless interactions between humans and computers. The accessibility and user-friendliness
of technology have increased due to computers' ability to understand and respond to human
language through techniques like speech recognition, sentiment analysis, and chatbots.
2. Data Analysis and Insights : Given the volume of unstructured text data that is easily accessible
on the internet, NLP helps organizations to extract valuable information from sources like social
media, customer reviews, and news articles. By means of data analysis, organizations can identify
trends, assimilate customer feedback, and arrive at educated decisions.
3. Personalization : By analysing user preferences and behaviours, natural language processing
(NLP) drives personalised content delivery and recommendation systems. This improves user
experiences across several platforms, including social media, streaming services, and e-commerce
websites, increasing user pleasure and engagement.
4. Automation and Efficiency : NLP reduces the need for manual intervention and increases
operational efficiency across sectors by automating a variety of activities like sentiment analysis,
translation, and document summarising. Human agents can address more complicated issues by
delegating typical requests to automated customer support systems backed by natural language
processing (NLP).
5. Sentiment Analysis and Brand Monitoring : Through NLP techniques like sentiment analysis,
organisations may track public opinion about their brand, goods, or services in real time. They
may evaluate client happiness, spot possible problems, and modify their approach with the use of
this data.
Ques 2 – Describe the process of building an NLP model, highlighting each of the eight steps.
Ans - Developing an NLP model usually entails a number of important processes, all of which are
essential to the model's growth and performance. This is a summary of the procedure :-
1. Data Collection :
 Collect representative and diverse datasets that support the definition of your topic.
 For supervised learning tasks, make sure the dataset is labelled suitably.
 Data may originate from a number of sources, including as APIs, pre-existing databases, web
scraping, or human annotations.
2. Data Preprocessing :
 Reduce noise, superfluous information, and inconsistent data to make it cleaner.
 Handle capitalization, punctuation, special characters, and tokenization of the text into words or
subword units.
 To decrease dimensionality and normalise the text, engage in activities such as stop word
removal, lemmatization, and stemming.
3. Model Selection :
 Based on the kind of NLP work, size of dataset, and available computing power, select a suitable
model architecture.
 Typical models include of transformer-based models, recurrent neural networks (RNNs),
convolutional neural networks (CNNs), logistic regression, Naive Bayes, and Support Vector
Machines (SVM).
 In case of low data, take into account pre-trained models for transfer learning.
4. Model Training :
 To assess how well the model performs, divide the dataset into test, validation, and training sets.
 Use methods like stochastic gradient descent (SGD), Adam, or RMSprop to optimise the selected
objective function (such as cross-entropy loss) while training the selected model on the training
set.
 In order to avoid overfitting, keep an eye on the model's performance on the validation set and
modify the hyperparameters as needed.
5. Model Training :
 To assess the performance of the model, divide the dataset into test, validation, and training sets.
 Using methods such as stochastic gradient descent (SGD), Adam, or RMSprop, optimise the
selected objective function (e.g., cross-entropy loss) and train the selected model on the training
data.
 To avoid overfitting, keep an eye on how the model performs on the validation set and modify the
hyperparameters as necessary.
6. Model Evaluation :
 Use the proper assessment measures, such as accuracy, precision, recall, F1-score, or perplexity,
for the particular NLP job to assess the trained model's performance.
 Evaluate the model's possible biases, generalisation capabilities, and robustness.
 If required, repeat the training and assessment procedure and fine-tune the model.
7. Deployment and Monitoring :
 Install the trained model in real-world settings and incorporate it into any systems or apps that it
will be utilised with.
 Install tracking tools to keep tabs on the model's performance and identify any drift or
deterioration over time.
 Update and refine the model frequently when new information becomes available or the
underlying issue changes.
Ques 3 - Conduct a simple web scraping task to collect data from a website of your choice and
present your findings.
Ans –
1.Install Required Libraries: First, you'll need to install the necessary libraries. You can do this
via pip, Python's package installer. You'll need BeautifulSoup and a library like requests to fetch
the HTML content from the website.import requests.
pip install beautifulsoup4
pip install requests
2.Fetch the HTML Content: Use the requests library to fetch the HTML content of the webpage
you want to scrape.
import requests
url = 'https://example.com'
response = requests.get(url)
3.Parse the HTML: Use BeautifulSoup to parse the HTML content fetched from the website.
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
4.Extract Data: Once you have the HTML parsed, you can use BeautifulSoup's methods to find
and extract the specific data you're interested in.
# Example: Extract all the text from <p> tags
paragraphs = soup.find_all('p')
for paragraph in paragraphs:
print(paragraph.text)
5.Clean and Organize Data: Depending on your needs, you may need to clean and
organize the extracted data.
6.Store or Analyze Data: You can store the scraped data in a file, database, or
perform further analysis as per your requirements.
Ques 4 – Discuss the challenges faced in NLP model building and how they can be addressed.
Ans - NLP model building involves several hurdles, from computational complexity to problems with
data quality :-
1. Domain Specificity :
 Challenge: Due to their lack of domain-specific vocabulary and expertise, NLP models trained on
generic datasets may perform poorly when applied to texts that are specialised or domain-
specific.
 Solution: To adapt the model to the target domain, fine-tune previously trained models using
domain-specific data or apply domain-specific embeddings. Moreover, domain adaptation
strategies like multi-task learning and adversarial training can enhance performance on domain-
specific tasks.
2. Data Bias and Fairness :
 Challenge: NLP models trained on biassed data run the risk of reproducing or even exaggerating
preexisting biases, producing unfair or discriminating results, especially when it comes to
sensitive characteristics like gender, ethnicity, or socioeconomic status.
 Solution: To reduce biases, carefully choose training data to ensure equal distributions across
various demographic groups and diversified representation. Furthermore, biases during model
training can be reduced by post-processing methods like fairness-aware learning objectives or
debiasing algorithms.
3. Interpretability and Explainability :
 Challenge: Deep learning-based natural language processing (NLP) models are frequently viewed
as "black boxes," which makes it challenging to comprehend the logic underlying their
judgements and predictions—a critical component of responsibility and trust.
 Solution: To shed light on model behaviour and aid in human comprehension, incorporate
interpretability strategies like gradient-based saliency methods, attention processes, or model-
agnostic strategies like LIME (Local Interpretable Model-agnostic Explanations).

NPL Assignment 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NPL Assignment 1

Uploaded by

Copyright:

Available Formats

Ques 1 – Explain the significance of NLP in today’s technology-driven world.

1. Human-Computer Interaction : Natural Language Processing (NLP) enables more intuitive

pip install requests

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, 'html.parser')

# Example: Extract all the text from <p> tags

for paragraph in paragraphs:

You might also like