Text Summarization Using NLP: Bachelor of Technology Computer Science and Engineering

Text Summarization using NLP
A Full Semester Internship Report Submitted in Partial Fulfilment for the Award of the Degree
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted
by
Kesari Venkatesh Reddy (18341A0558)
Under the Esteemed Guidance of

Dr. K. Lakshmana Rao
Associate Professor
June 2022
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Internship carried out at
COGNIZANT TECHNOLOGY SOLUTIONS., BANGALORE
A Full Semester Internship Report Submitted in Partial Fulfilment for the Award of the Degree
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted
by
June 2022

GMR INSTITUTE OF TECHNOLOGY
(An Autonomous institute, affiliated to J.N.T.University Kakinada)
NAAC “A” Graded, NBA Accredited, ISO 9001:2008 Certified Institution G.M.R.
Nagar, Rajam-532127, A.P
BONAFIDE CERTIFICATE
Signature of Faculty Supervisor Signature of Industry Supervisor
Dr. K. Lakshmana Rao Ms. D. Anushree

Associate Professor Human Resource - Genc
Department of CSE Cognizant Technology Solution
GMRIT, Rajam.
This is to certify that report entitled “Text Summarization using NLP” submitted by Kesari
Venkatesh Reddy (18341A0558) who completed the internship program under our guidance
and supervision at COGNIZANT TECHNOLOGY SOLUTIONS has been carried out in
partial fulfilment for the award of B.Tech degree in the discipline of Computer Science and
Engineering to JNTUK is a record of bonafide work . The results embodied in this report have
not been submitted to any other university or institution for the award of any degree or diploma.
Signature of the H.O.D
Dr. A. V. Ramana
Professor and HOD
Department of CSE
GMRIT, Rajam
Internship Certificate
ACKNOWLEDGEMENT
I would like to take this opportunity to thank Ms. D. Anushree, Human Resources - Genc,
CTS Bangalore for providing all the necessary facilities that led to the successful completion
of our internship.
I would like to sincerely thank internal supervisor, Dr. K. Lakshmana Rao, Associate
Professor, Department of Computer Science and Engineering for wholehearted and valuable
guidance throughout the program.
I would like to thank our beloved Principal Dr. C.L.V.R.S.V. Prasad, Head of the Department
Dr. A. Venkata Ramana, Professor, Computer Science and Engineering for providing great
support in completing the full semester Internship.
It gives me an immense pleasure to express deep sense of gratitude to the central Internship
team Dr. Surya Narayan Dash, Internship Head, I would sincerely thank our department
coordinators Dr. K. Sri Vidya, Associate Professor, Department of Computer Science and
Engineering and Dr. V. Prasad, Associate Professor, Department of Computer Science and
Engineering for their great support.

ABSTRACT
Text summarization is necessary to get the most precise and useful information from a large
document and eliminate the irrelevant or less important ones. Text summarization can be carried
out mainly in two ways. They are abstractive text summarization and extractive text
summarization. The automatic text summarization can be used on single document or multi
document. Also, the web page summarization can be done using web scraping and bringing the
content and summarizing it. This decreases the redundancy of files and saves time in
understanding large information. The text summarization task can be challenging due to its vast
usage capability, if not done properly it cannot be used. Thus, NLP comes in help to
understanding the language and extract the useful sentences or information that are critical in
understanding of the topic. Here the Text ranking algorithm and cosine similarity is used to
summarize the text. The data is given as text or a website page URL in which summary is
necessary.
Keywords:
Text Rank Algorithm, Natural Language Processing, Text Summarization, Extractive
Text summarization
ii
TABLE OF CONTENTS
CHAPTER TITLE PAGE NO.
Certificate
Acknowledgement i
Abstract ii
Index iii
List of Figures iv
List of Tables v
1 Introduction
1.1 Introduction 1
1.2 Benefits of Internship 1
1.2.1 Benefits to the Students 2
1.2.2 Benefits to the Industry 2
1.2.3 Benefits to the Institution 2
1.3 Ethics 3
1.4 Values 3
2 Profile of the Company
2.1 About the Company 4
2.2 Services 5
3 Tasks Taken Up and Problem Definition
3.1 Introduction 8
3.2 Problem Statement 9
3.3 Existing System and its disadvantages 9
3.4 Proposed System and its advantages 9
3.5 Technologies used 10
3.6 Literature Survey 10
4 Methodology and Learning
4.1 Data Pre-Processing 16
4.0.0 Data Cleaning 17
4.0.1 Tokenization 17
4.2 Implementing Text Rank Algorithm 18
4.3 Displaying Output through Django 19
4.4 Design 20
4.5 Requirements 20
5 Results
5.1 Results 21
6 Conclusion and Suggestions
6.1 Conclusions 24
6.2 Future Scope 24
Appendix 25
Bibliography 32
List of Figures
Fig. No Name of the Figure Page No.

3.1 Flow chart of Base paper 10
3.2 Flow chart of Reference Paper 11
4.1 Django Framework Flow chart 18
4.2 Flow chart of Proposed System 19
5.1 The main page of our design 21
5.2 Input is given as URL of Sachin Tendulkar
Wikipedia page as example 21
5.3 Output is shown like this with time taken 22
5.4 Input is given as text as example 22
5.5 Output 23
iv
List of Tables
Tab. No Name of the Table Page No.

4.1 Word Tokenization of Sentence 16
4.2 Sentence Tokenization 16
5.1 Comparison of Results 23
v
Text Summarization using NLP 2022
1. INTRODUCTION
1.1 Introduction
An internship is a trained and supervised experience in a professional setting in which a

student is learning and gaining essential experience and expertise. Internship is meant for
introducing candidates either full-time or part time to a real-world experience related to
their career goals and interests. Internship is an excellent way to build those all-important
connections that are invaluable in developing and maintaining a strong professional
network for the future. Internships provide real world experience to those looking to
explore or gain the relevant knowledge and skill required to enter a career field. Internship
is relatively short term in nature with the primary focus on getting some on the job training
and taking what’s learning in the classroom and applying it to the real world.
1.2 Benefits of Internship
Students learn how their course of study applies to the real world and build valuable
experience that makes them stronger candidates for jobs after graduation.
● Internship at a start-up will benefit in improving team spirit, adapting to flexible
working times and client services.
● You can get serious work experience, build a portfolio and establish a network of
professional contacts which can help you after you graduate.
● The main advantage is to have practical knowledge. In our college we can have
theoretical knowledge which doesn’t help much. Working on a project gives
practical experience.
● Confidence can be increased when we were involved in solving problems and were
succeeded in solving it.
1
● Working on a project also improves communication skills and interpersonal skills. As

we need to talk to higher authorities regarding the project our skills can be better when
compared.
● Having several internships while in college can be very impressive to potential
employers.
● Working in a team for a project teaches us how to interact with our colleagues and how
to deal with them without hurting the feelings of both sides.
1.1.1 Benefits to the Students

● Learning by doing.
● All round development.
● Aid in career planning.
● Experience of professional working conditions.
● Smooth transition from campus to company.
1.1.2 Benefits to the Industry

● Steady stream of skilled manpower provides value addition and increased
productivity.
● Human Resource Development benefits.
● Conduit for Industrial Partnership.
● Employer Branding.
1.1.3 Benefits to the Institution

● Inputs to quickly adapt curriculum to match the needs of industry.
● Opportunities for research and consultancy.
● Access to industrial expertise and infrastructure.
1.3 Ethics
● Help develop an organizational environment favourable to acting ethically
● Improve their understanding of the software and related documents on which they
work and of the environment in which they will be used.
2
● Accept full responsibility for their own work.

● Improve their ability to produce accurate, informative, and well-written
documentation.
● Not promote their own interest at the expense of the profession, client or employer.
● Assist colleagues in professional development.
● Strive to fully understand the specifications for software on which they work.
● Improve their knowledge of the Code, its interpretation, and its application to their
work.
1.4 Values
● Professional communications.
● Be proactive, and when invited to work functions introduce oneself to people.
● Taking constructive criticism well.

● Being able to work independently with little guidance is very important in the
working world.
● Always work hard even if the task is small and seems unimportant.
● Make an effort during the course of the internship to build relationships with people
around the office.
3
2. PROFILE OF THE COMPANY
About Company:
Cognizant Technology Solutions Corporation is a global leader and a multinational
company. It is one of the important businesses outsourcing company in the world. The
headquarters of the company is in Teaneck, New Jersey. It was founded originally as a
technology unit of Dun & Bradstreet, and the headquarters are in Chennai, India. In 1996,
Cognizant started exceeding performance with its international clients. The next year, the
company had its headquarters moved from Teaneck to Chennai in India. Cognizant was
the first company to be listed on NASDAQ 100. After accepting some of the work of
application maintenance, it went into application development. During the 2000s, time
looked like a golden era for the company. It became one of the Future 500 companies in
2011. It is also known as the World’s Most Admired Companies. The company is split
into two new major services, Nelson Media Research, and IMS Health. After some time, it
became the public subsidiary of the IMS Health. But in 2003, Cognizant sold all its shares
in the subsidiary and the CEO also resigned from his post. The company expanded its
work from IT services to Outsourcing and business consulting as well. There was a fast
growth in the success of Cognizant. The services provided include application
development, business intelligence, supply chain management, CRM, etc. The company
has 318,400 employees globally, of which over 150,000 are in India across 10 locations
with a plurality in Chennai. On 20 Jan, 1994 Cognizant registered its branch in Chennai,
Tamil Nadu, India with the legal name Cognizant Technology Solutions India Private
4
Limited. The company has local, regional and global delivery centres in the UK, Australia,
Hungary, Netherlands, Spain, China, Philippines, Canada, Brazil, Argentina, Mexico etc
Business Units:
Cognizant is organized into several vertical and horizontal units. The vertical units focus
on specific industries such as Banking & Financial Services, Insurance, Healthcare,
Manufacturing and Retail. The horizontals focus on specific technologies or process areas
such as Analytics, mobile computing, BPO and Testing. Both horizontal and vertical units
have business consultants, who together form the organization-wide Cognizant Consulting
team. Cognizant is among the largest recruiters of MBAs in the industry; they are involved
in business analysis for IT services projects.
2.1 Services offered by Cognizant:
It provides digital solutions that can make advancements for Business
• Application Services & Modernization
• Artificial Intelligence
• Business Process Services
• Cloud Enablement
• Core Modernization
• Digital Experience
• Digital strategy
• Enterprise Services
5
• Industry & platform Solutions
• Infrastructure Services
• Quality Engineering & Assurance
• Security
• Software Product Engineering
• Sustainability Services
Quality engineering at speed and scale is main principle of Cognizant. Cognizant
developers and architects employ agile practices to combine full-stack software
development with user-driven design. At Cognizant main focus on software development
platform-as-a-service (PaaS) environments to ensure quality, cloud portability and security
for every product. End goal is to build the right software that meets ones needs
straightaway, with products that work smarter and faster. Products:
• Cognizant Big Decisions
• Cognizant Data Insights
• Cognizant Insurance Intake Automation
• Cognizant Document Accelerator
• Risk Profile Gateway
• Digital Retirement Operations
• Shared Investigator Platform
6
Segments Cognizant Serve:
High-tech: They help high-tech companies rethink their business models and plan and
implement transformational processes across the product lifecycle. Partner with cognizant
to get ahead of the demand curve as well as become nimbler and drive operational
efficiency for more profitable growth.
Platform: The race to the next billion users, monetizing content and supporting aggressive
growth in new channels has upended the industry. Cognizant partner with companies to
accelerate digital at scale and to operate more efficiently, and power growth.
Software: In the subscription economy, adapting to customer needs is top priority.

Cognizant partner with software clients and keep the emphasis on speed to market, helping
them to launch new revenue models, and accelerate product development and release
cycles.
7
3. TASKS TAKEN UP AND PROBLEM DEFINITION
3.1 Introduction
Automatic Text Summarization is useful in many fields such as Education, Research,

News Articles summary, … The extractive text summarization that can be used to gain
insights about the document or long text. Thus, performing extractive text summarization
using NLP algorithms as Text Rank Algorithm and cosine similarity. The existing
approaches are using Recurrent Neural Networks, Long Short-Term Memory, Graph
based frameworks, sentence ranking, supervised approach … Due to neural network
models like these can significantly increase in execution time. The neural networks and
supervised models to understand the language it requires knowledge of corpora of that
language. The proposed approach is using Text Rank Algorithm which is a natural
language algorithm. Text Rank works well because it does not only rely on the local
context of a text unit (vertex), but rather it considers information recursively drawn from
the entire text (graph). Text Rank performs better than most of the supervised learning
approaches. Text summarization application is designed for easy use by providing either
URL or text directly. It can be done in two different ways. They are extractive and
abstractive text summarization. Here the extractive text summarization method is used.
This can also be challenging, so finding cosine similarity of sentences can be useful in this
situation. The text from a website is scraped using BeautifulSoup module available in
python. The number of paragraphs summary needed can be given but it is optional and set
to 5 if not provided. The NLP algorithm used is Text Rank algorithm. This is used for
ranking sentences and words according to their importance and usage in the given text or
input. The result is displayed on the left side with box heading text summary. The front-
end technology used is Django application which is similar to MVT (Model, view,
template) pattern. All the requests go to urls.py. From there views are selected based on
8
url and models are used accordingly. This Text summarizer application is used for
summarizing the given text.
3.2 Problem Statement
Text summarization can be carried out mainly in two ways. They are abstractive text
summarization and extractive text summarization. The automatic text summarization can be used
on single document or multi document. Also, the web page summarization can be done using web
scraping and bringing the content and summarizing it. This decreases the redundancy of files and
saves time in understanding large information. The text summarization task can be challenging
due to its vast usage capability, if not done properly it cannot be used. Thus, NLP comes in help
to understanding the language and extract the useful sentences or information that are critical in
understanding of the topic. Here the Text ranking algorithm and cosine similarity is used to
summarize the text. The data is given as text or a website page URL in which summary is
necessary
3.3 Existing System and its Disadvantages:
The existing approaches are using Recurrent Neural Networks, Long Short-Term Memory,
Graph based frameworks, sentence ranking, supervised approach … Due to neural
network models like these can significantly increase in execution time. The neural
networks and supervised models in order to understand the language it requires knowledge
of corpora of that language.
3.4 Proposed System and its Advantages:
9
The proposed approach is using Text Rank Algorithm which is a natural language
algorithm. Text Rank works well because it does not only rely on the local context of a
text unit (vertex), but rather it considers information recursively drawn from the entire text
(graph). Text Rank performs better than most of the supervised learning approaches. The
user interface is designed using Django and takes input either a URL or text which is to be
summarized.
3.5 Technologies Used

Google Collab, TensorFlow, NLTK, Django
3.6 Literature Survey
The authors in [1], proposed the abstractive text summarization is done by using LSTM-CNN
model. The dataset are taken from the daily news coverages as CNN, DailyMail websites. The
CNN dataset has more than 92000 texts and corresponding summaries. The DailyMail dataset has
219000 texts and corresponding summaries. The preprocessing is done in three steps: word
segmentation, morphological reduction, coreference resolution. The below figure shows the
design.
3.1 Flowchart of Base Paper

10
Tian Shi et al, [2] proposed a Abstractive text summarization is done by RNN based Seq2Seq
model. The datasets are taken from CNN/DailyMail Dataset(300k news articles), Newsroom
Dataset(1.3 million news articles), Bytecup Dataset(1.3 million news articles). Also discussed
about evaluation parameters such as ROUGE, BERTScore, fluency, factual correctness,
relevance.
Seq2Seq model
3.2 Flow chart of the reference paper
R. Ganesh Kumar et al,[3] proposed the extractive text summarization is done using sentence
ranking. The work is done based on single document text summarization. The input data is given
by a word document with text. Then main task is to identifying the important paragraphs and
giving weights to sentences. After the summarization, it is compared to the human generated
summary. The evaluation metric used is ROUGE.
The authors in [4], proposed the extractive text summarization is done by sentence content
relevance, sentence novelty, sentence position relevance. In this content relevance is done using
deep auto encoders. By combining these three metrics the authors have performed extractive text
summarization. The datasets used are CNN and DailyMail dataset, DUC dataset, Tor Illegal
11
Documents summarization dataset, Blog summarization dataset. This approach performed better
in some of the ROUGE evaluation metrics compared to traditional ML models.
The authors in [5], proposed text summarization is done using clustering and optimization
techniques called COSUM. The first step is clustering of sentences by k-means. The second step
is selecting important sentences from the clusters based on different features. The datasets used
are DUC2001(309 articles) and DUC2002(567 documents). The pre-processing steps are splitting
into sentences, removing stop words and noisy words, upper case removal, stemming. The
evaluation metric used is ROUGE. This model performs better for ROUGE-1 and ROUGE-2.
The authors in [6], follows graph-based text summarization techniques for single and multiple
documents. The dataset used is DUC2002 which is available publicly. The preprocessing is done
in three steps: word segmentation, morphological reduction, coreference resolution. The
performance metrics as ROUGE-1, ROUGE-2, ROUGE-SU performed better than previous
methods. The following shows the architecture of graph-based text summarization system.
12
The authors in [7], proposed Automatic text summarization is done using fuzzy rules of different
features. The important text can be extracted using these fuzzy rules. The dataset used is Brazilian
Portuguese dataset which is given by students in virtual learning environment. The metric used is
ROUGE for comparison.
The authors in [8], proposed Automatic Text Summarization(ATS) is done by graph-based
framework “EdgeSumm”. The datasets used are DUC2001(308 English news documents and 616
model summaries) and DUC2002(567 news reports documents). The performance metric is
ROUGE. The average ROUGE score is better than other standard and state-of-the-art systems.
The authors in [9], focussed on text summarization to get useful sentences that depict sentiments
of customers on the services provided by the hotels. The dataset is taken from online hotel
booking platform TripAdviser. The pre-processing includes spell check, word segmentation,
stemming, parts-of-speech tagging. After summarization, sentiments are found on which services
can be improved by the hotels.

13
The authors in [10], proposed Extractive Text Summarization is done by two steps. First feature
generation using LDA, One hot encoding, TF-IDF, Doc2Vec. Second clustering similar sentences
and finding proximity using cosine similarity, silhouette index. Then selecting important
sentences from the clusters and generating summary. The performance metrics as ROUGE-1,
ROUGE-2, ROUGE-SU performed better than previous methods as using only LDA or TF-IDF.
The datasets used are DUC2002, TAC2011 datasets.
Proposed approach
The authors in [11], proposed text summarization is done by Sequence-to-Sequence model in deep
learning approach. The baseline models are also applied, and ROUGE score is used as
performance metric and comparison is done. The dataset comprises of 300,000 entries of articles
and their headlines. The proposed methodology flowchart is as below.
14
The authors in [12], describes the various types of text summarization techniques based on deep
learning. The datasets used are CNN and DailyMail Dataset, New York Times dataset, DUC2004
dataset, Amazon review dataset. Also, this paper focuses on the pre-processing steps to be
followed. The proposed architecture is as below.
The authors in [13], proposed text summarization is done using latent semantic analysis (LSA).
The authors used single document and multi document approach. The dataset is taken from legal
judgements issued by Indian judiciary system. The ROUGE-1 is 0.58. The proposed approach is
shown.
The authors in [14], text summarization is done using LSTM (Long Short-Term Memory). The
dataset is DUC2001, SIGIR2018. The ROUGE-1 is 0.607, ROUGE-2 is 0.501, ROUGE-L is
0.569.
The authors in [15], Extractive Text summarization is done by the fuzzy inference systems. The
dataset used is DUC2002. The ROUGE-1-2-L achieved 0.66, 0.59, 0.66 respectively. This method
15
achieved better performance than neural networks for ROUGE-1. Also discussed detailed pre-
processing steps for extractive text summarizations.
4. METHODOLOGY AND LEARNING
The main motto of this project is to summarize the text that takes the text as input and displays the
summary through a user interface.
It involves 3 steps:
1. Preprocessing
2. Implementing Text rank algorithm
3. Displaying the result using Django framework
4.1 Preprocessing
The text is taken from the textbox from the user interface provided. The user can also provide
a URL in which text has to be extracted. The paragraphs present in web page provided by
user is scraped and taken as input. This input text must be preprocessed before applying text
rank algorithm.
Tokenizing the text: The text is tokenized by using NLP library.
e.g., This is a sample. This is a sentence.
After word tokenizing the above sentences.
This is a sample This is a sentence

Table 4.1 Word Tokenization of sentence
After sentence tokenizing the above sentence.

This is a sample This is a sentence
Table 4.2 Sentence Tokenization of sentence
16
4.2 Implementing the Text Rank algorithm:
The input text after preprocessing is taken as input for this step. The process of extractive
summarization requires the most important sentences among the whole input. Thus,
identifying the sentences that are to be displayed in the summary is done in this step. For this,
importance of sentence is identified by the Cosine Similarity method. The similarity
measurement is a measure of the cosine of the angle between the two non-zero vectors. The
libraries in python in which cosine similarity is available are scikit-learn, TensorFlow. The
similarity increases when the distance between two vectors decreases and vice-versa.
Of course, the initial step is to extract all the sentences from the text. This might be as simple
as separating the text at "." or newlines, or it can be more complicated if we want to fine-tune
the definition of a sentence. Parsers are never removed from the system; they are simply
abandoned. Once you have all the sentences in the text, we must create a graph in which each
sentence is a node and linkages between them or to the k-most similar sentences weighted by
similarity are established.
This method allows us to program Text Rank without having to do any arithmetic or use
matrices, all we need is your graph and a function to compute sentence similarity.
It determines how similar each sentence is to the rest of the text. The similarity function
should be directed to the meaning of the sentence, and cosine similarity approach can work
well.
If we extract words instead of sentences and follow the same algorithm, using a similarity
function between words then we can use Text Rank to extract keywords from the text, the
17
idea is that the word that is most like all the other words is the most important one. Filtering
stop-words is very important here.
4.3 Displaying result using Django Framework:
• Django is a high-level python web framework that helps in writing software that is
complete, versatile, secure, scalable, maintainable and also portable.
• The main important feature of Django is it follows the MVT(Model view template).
4.1 Django Framework Flowchart
18
4.4 Design
4.2 Flowchart of Proposed System
4.5 Requirements
asgiref==3.2.10
beautifulsoup4==4.9.1
certifi==2020.6.20
chardet==3.0.4
click==7.1.2
Django==3.1
gunicorn==20.0.4
19
idna==2.10
joblib==0.16.0
lxml==4.5.2
nltk==3.5
numpy==1.19.1
pandas==1.1.0
python-dateutil==2.8.1
pytz==2020.1
regex==2020.7.14
requests==2.24.0
six==1.15.0
soupsieve==2.0.1
sqlparse==0.3.1
tqdm==4.48.2
urllib3==1.25.10
20
5. RESULTS
GUI:
The figures 5.1 -5.5 represent the screenshots of the built GUI.
5.1 The main page of design
21
5.2 Input is given as URL of Sachin Tendulkar Wikipedia page as example
5.3 Output is shown like this with time taken
22
5.4 Input is given as text(3 paragraphs on global warming) as example
5.5 Output is shown as below
23
ROUGE-1 ROUGE-2 ROUGE-L ROUGE-W

[1] 0.349 0.178 - -
[2] 0.3936 0.2786 0.3635 -
[3] 0.519 0.366 0.47 0.377
[10] 0.498 - - -
[13] 0.583 0.15 0.35 -
Proposed 0.524 0.19 - -
method
Table 5.1 Comparison of Results
6. CONCLUSION AND SUGGESTIONS
6.0 Conclusion
With the help of this application, the text is summarized which makes use of Natural Language
Processing technique called Text Rank algorithm. A short summary was generated keeping intact
the important ideas from the original text. The similarity between the sentences is calculated using
cosine similarity. Thus, text summarization is done using extractive approach. This needs to use
deep learning models.
6.1 Future Scope

The abstractive text summarization is useful as it generates summary based on new words
formed during training. This implementation is done using deep learning techniques. Based on the
24
work done in this paper, future scope is to develop deep learning models that can generate
summary in its own words.
APPENDIX
Summarizer.py
from bs4 import BeautifulSoup
import requests
import nltk
nltk.download('punkt')
nltk.download("stopwords")
from nltk.corpus import stopwords

import re
import numpy as np
import pandas as pd
def cosine_similarity(X, Y_set):

25
X_list = nltk.word_tokenize(X)
sw = stopwords.words('english')
l1 =[];l2 =[]
X_set = {w for w in X_list if not w in sw}
rvector = X_set.union(Y_set)
for w in rvector:
if w in X_set: l1.append(1)
else: l1.append(0)
if w in Y_set: l2.append(1)
else: l2.append(0)
c=0
for i in range(len(rvector)):
c+= l1[i]*l2[i]
cosine = c / float((sum(l1)*sum(l2))**0.5)
return cosine
def summarizer(content, num_lines = 5):

content = re.sub(r"\[[^()]*\]",' ',content)
words = nltk.word_tokenize(content)
sentences = nltk.sent_tokenize(content)
Y_set = {w for w in words if not w in stopwords.words('english')}
26
word_count = {}
for word in words:

if word not in stopwords.words('english'):
if word not in word_count:
word_count[word] = 1
else:
word_count[word]+=1
scores = []
for sentence in sentences:

words = nltk.word_tokenize(sentence)
score = 0
for word in words:
if word in word_count:
score += word_count[word]
scores.append(score)
scores = np.asarray(scores) / max(scores)
df = pd.DataFrame({'Sentences': sentences, 'Scores':scores})
sorted_df = df.sort_values(by = "Scores", ascending=False).reset_index()
paras = []
similarity = []
for i in range(len(sorted_df)):
paras.append(' '.join(list(sorted_df.iloc[i:i + num_lines,1])))
similarity.append(cosine_similarity(' '.join(list(sorted_df.iloc[i:i + num_lines,1])), Y_set))
27
return paras[similarity.index(max(similarity))].split('. ')
def url_summarizer(link, num_lines=5):

source = requests.get(link).text
soup = BeautifulSoup(source, 'lxml')
paras = soup.find_all("p")
content = []
for para in paras:
content.append(para.text)
content = ' '.join(content)
return content, summarizer(content, num_lines)
urls.py
from django.urls import path

from . import views
urlpatterns = [
path('', views.index, name = 'index'),
]
Views.py
from django.shortcuts import render, redirect

from django.contrib import messages
28
from .Summarizer import summarizer, url_summarizer

import re
import time
# Create your views here.
def index(request):
context = {'flag':False, 'url_error':False, 'summarize_div':True}
if request.method == 'POST':
if len(request.POST['textarea']) > 0 and len(request.POST['url_link']) > 0:
messages.error(request, "Enter either URL or Text, not Both.")
return redirect('index')
if len(request.POST['textarea']) >0:
if request.POST['num_lines']:
num_lines = request.POST['num_lines']
else:
num_lines = 5
content = request.POST['textarea']
start = time.time()
summary = summarizer(content, int(num_lines))
end = time.time()
context['time_taken'] = round(end-start, 2)
context['flag'] = True
context["content"] = content
context["summary"] = summary
context['summarize_div'] = False
return render(request, 'text_summarizer/index.html', context)
elif len(request.POST['url_link']) >0:
29
if re.search("(ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&
%@!\-\/]))?", request.POST['url_link']) == None:
context['url_error'] = True
else:
if request.POST['num_lines']:
num_lines = request.POST['num_lines']
else:
num_lines = 5
try:
start = time.time()
content, summary = url_summarizer(request.POST['url_link'], int(num_lines))
end = time.time()
context['time_taken'] = round(end-start, 2)
context['flag'] = True
context["content"] = content
context["summary"] = summary
context['summarize_div'] = False
except:
messages.error(request, "Entered URL doesn’t contain any Data.")
else:
messages.error(request, "Enter URL or Text to summarize the content.")
urls.py(TextSummarizer)
30
"""TextSummarizer URL Configuration
The `urlpatterns` list routes URLs to views. For more information please see:
https://docs.djangoproject.com/en/3.0/topics/http/urls/
Examples:
Function views
1. Add an import: from my_app import views
2. Add a URL to urlpatterns: path('', views.home, name='home')
Class-based views
1. Add an import: from other_app.views import Home
2. Add a URL to urlpatterns: path('', Home.as_view(), name='home')
Including another URLconf
1. Import the include() function: from django.urls import include, path
2. Add a URL to urlpatterns: path('blog/', include('blog.urls'))
"""
from django.contrib import admin
from django.urls import path, include
urlpatterns = [
path('admin/', admin.site.urls),
path('', include('text_summarizer.urls'))
]
31
BIBLIOGRAPHY
[1] Song, S., Huang, H. & Ruan, T. Abstractive text summarization using LSTM-CNN based
deep learning. Multimedia Tools Applications 78, 857–875 (2019) Springer.
[2] Tian Shi, Yaser Keneshloo, Naren Ramakrishnan, and Chandan K. Reddy. 2020. Neural
Abstractive Text Summarization with Sequence-to-Sequence Models. ACM/IMS Trans. Data Sci.
2, 1, Article 1 (December 2020).
[3] J. N. Madhuri and R. Ganesh Kumar, "Extractive Text Summarization Using Sentence
Ranking," 2019 International Conference on Data Science and Communication 2019, pp. 1-3,
IEEE.
[4] Joshi, A., Fidalgo, E., Alegre, E., & Fernández-Robles, L. (2019). SummCoder: An
Unsupervised Framework for Extractive Text Summarization Based on Deep Auto-encoders.
Expert Systems with Applications, Elsevier.
32
[5] Tsai, C.-F., Chen, K., Hu, Y.-H., & Chen, W.-K. (2020). Improving text summarization of
online hotel reviews with review helpfulness and sentiment, Elsevier.
[6] Mohamed, M., & Oussalah, M. (2019). SRL-ESA-Text Sum: A text summarization approach
based on semantic role labelling and explicit semantic analysis. Information Processing &
Management, 56(4), 1356–1372, Elsevier.
[7] Goularte, F. B., Nassar, S. M., Fileto, R., & Saggion, H. (2019). A text summarization method
based on fuzzy rules and applicable to automated assessment. Expert Systems with Applications,
115, 264–275, Elsevier
[8] El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2020). EdgeSumm:
Graph-based framework for automatic text summarization. Information Processing &
Management, 57(6), 102264, Elsevier
[9] Tsai, C.-F., Chen, K., Hu, Y.-H., & Chen, W.-K. (2020). Improving text summarization of
online hotel reviews with review helpfulness and sentiment. Tourism Management, 80, 104122,
Elsevier
[10] Hernandez-Castaneda, A., Garcia-Hernandez, R. A., Ledeneva, Y., & Millan-Hernandez, C.
E. (2020). Extractive Automatic Text Summarization based on Lexical-semantic Keywords. IEEE
Access, 1–1
[11] Al-Maleh, M., Desouki, S. Arabic text summarization using deep learning approach. J Big
Data 7, 109 (2020), Springer
[12] R. S. Shini and V. D. A. Kumar, "Recurrent Neural Network based Text Summarization
Techniques by Word Sequence Generation," 2021 6th International Conference on Inventive
Computation Technologies (ICICT), 2021, pp. 1224-1229, IEEE
[13] K. Merchant and Y. Pande, "NLP Based Latent Semantic Analysis for Legal Text
Summarization," 2018 International Conference on Advances in Computing, Communications,
and Informatics (ICACCI), 2018, pp. 1803-1807, IEEE

33
[14] Candidate sentence selection for extractive text summarization Begum Mutlu, Ebru A. Sezer,
M. Ali Akcayol, Information Processing and Management, Elsevier, 2020
[15] B. Mutlu, E.A. Sezer and M.A. Akcayol, Multi-document extractive text summarization: A
comparative assessment on features, Knowledge-Based Systems (2019), Science Direct
[16]https://www.analyticsvidhya.com/blog/2022/02/text-summarisation/
[17]https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-
using-deep-learning-python/
[18]https://medium.com/luisfredgs/automatic-text-summarization-with-machine-learning-an-
overview-68ded5717a25
34

Text Summarization Using NLP: Bachelor of Technology Computer Science and Engineering

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Text Summarization Using NLP: Bachelor of Technology Computer Science and Engineering

Uploaded by

Copyright:

Available Formats

Text Summarization using NLP

Kesari Venkatesh Reddy (18341A0558)

Under the Esteemed Guidance of

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Kesari Venkatesh Reddy (18341A0558)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Signature of Faculty Supervisor Signature of Industry Supervisor

Dr. K. Lakshmana Rao Ms. D. Anushree

Signature of the H.O.D

Kesari Venkatesh Reddy (18341A0558)

CHAPTER TITLE PAGE NO.

Fig. No Name of the Figure Page No.

3.2 Flow chart of Reference Paper 11

3.3 Flow chart of Reference Paper 12

3.4 Flow chart of Reference Paper 13

3.5 Flow chart of Reference Paper 14

3.6 Flow chart of Reference Paper 15

4.1 Django Framework Flow chart 18

4.2 Flow chart of Proposed System 19

5.1 The main page of our design 21

5.2 Input is given as URL of Sachin Tendulkar

Wikipedia page as example 21

5.3 Output is shown like this with time taken 22

5.4 Input is given as text as example 22

Tab. No Name of the Table Page No.

4.2 Sentence Tokenization 16

5.1 Comparison of Results 23

An internship is a trained and supervised experience in a professional setting in which a

1.2 Benefits of Internship

● Working on a project also improves communication skills and interpersonal skills. As

1.1.1 Benefits to the Students

● All round development.

● Aid in career planning.

● Experience of professional working conditions.

● Smooth transition from campus to company.

1.1.2 Benefits to the Industry

● Human Resource Development benefits.

● Conduit for Industrial Partnership.

1.1.3 Benefits to the Institution

● Accept full responsibility for their own work.

● Be proactive, and when invited to work functions introduce oneself to people.

● Taking constructive criticism well.

2. PROFILE OF THE COMPANY

Cognizant Technology Solutions Corporation is a global leader and a multinational

headquarters of the company is in Teaneck, New Jersey. It was founded originally as a

growth in the success of Cognizant. The services provided include application

on specific industries such as Banking & Financial Services, Insurance, Healthcare,

in business analysis for IT services projects.

2.1 Services offered by Cognizant:

It provides digital solutions that can make advancements for Business

• Application Services & Modernization

• Business Process Services

• Industry & platform Solutions

• Quality Engineering & Assurance

• Software Product Engineering

Quality engineering at speed and scale is main principle of Cognizant. Cognizant

developers and architects employ agile practices to combine full-stack software

development with user-driven design. At Cognizant main focus on software development

platform-as-a-service (PaaS) environments to ensure quality, cloud portability and security

straightaway, with products that work smarter and faster. Products:

• Cognizant Big Decisions

• Cognizant Data Insights