Professional Documents
Culture Documents
Dr. Annapurna P Patil , Shivam Dalmia, Syed Abu Ayub Ansari, Tanay Aul, Varun Bhatnagar
annapurnap2@msrit.edu shivam.dalmia@gmail.com ayub1993@gmail.com
Department of Computer Science
M. S. Ramaiah Institute of Technology
Bangalore
978-1-4799-3080-7/14/$31.00 2014
c IEEE 1530
III. RELATED WORK summary. This process is implemented using [1]. However, the
The extractive summary is generated completely using the process of abstraction of the summary based on the semantics
principles outlined in the TextRank extraction technique, is the main focus of this architecture for the algorithm. The
described in [1]. next step involves altering the text using the lexical database
which in this case is the WordNet thesaurus.
References [2] and [3] describe graph based approaches to
replacing words with synonyms. In [2] the concept of using the V. IMPLEMENTATION
context and possible synonyms of a word to form a graph has
The preprocessing is done in the following steps:
been explored. We use a similar method, however the
similarity scores are generated using the WordNet[4] similarity 1. The given text is extracted from the file and split into
function, and then the scores of vertices of the graph are sentences based on punctuation like periods (.),
calculated using Equation[2]. This results in a procedure interrogation marks (?), and exclamation marks (!).
similar to that of TextRank[1], but operating on words, and 2. The sentences are each split into words on
hence informally referred to as WordRank. whitespaces.
3. Stop-words such as ‘a’, ‘an’, ‘the’, ‘of’, ‘I’, are
removed from the document, as are any trailing
IV. DESIGN punctuation marks.
4. The remaining words are stemmed into their base
A. Architectural Design forms. E.g. ‘replacement’, ‘replacing’, ‘replaced’,
‘replace’ are all stemmed to ‘replace’.
NOTE: These steps are done only to facilitate the
processing. The results contain all the words in their
original forms.
5. The resulting words are stored as a list according to
their sentences.
Figure [1]
(2)
B. Design Rationale
In this architectural design, The Extractive Summarizer merely
copies the information scored highest by the system to the
1XPEHURI:RUGV 1XPEHURI:RUGV
1XPEHURI6HQWHQFHV 1XPEHURI6HQWHQFHV
1XPEHURI&KDUDFWHUV 1XPEHURI&KDUDFWHUV
“Thomas Alva Edison was one of the “Thomas Alva Edison was one of the
greatest inventors of the 19th century. greatest inventors of the 19th
He is most famous for inventing the century.
light bulb in 1879.
He is most famous for inventing the
He also developed the world's first
electric light-power station in 1882. light bulb in 1879.
Edison was born in the village of
Milan, Ohio, on February 11, 1847. His first inventions helped improve
the telegraph, an early method for
His family later moved to Port Huron, sending messages over electric wires.
Michigan. He went to school for only
three months, when he was seven. After At twenty-one, Edison produced his
that, his mother taught him at home. first major invention, a stock ticker
for printing stock-exchange quotes.
Thomas loved to read. At twelve years
old, he became a train-boy, selling
He was paid $40,000 for this
magazines and candy on the Grand Trunk
Railroad. invention.
He spent all his money on books and He took this money and opened a
equipment for his experiments. At the manufacturing shop and small
age of fifteen, Edison became manager laboratory in Newark, New Jersey.
of a telegraph office.
Later he gave up manufacturing, and
His first inventions helped improve the moved his laboratory to Menlo Park,
telegraph, an early method for sending
messages over electric wires. New Jersey.
At twenty-one, Edison produced his During the rest of his life he and his
first major invention, a stock ticker laboratory invented the phonograph,
for printing stock-exchange quotes. He film for the movie industry, and the
was paid $40,000 for this invention. alkaline battery.
He took this money and opened a By the time he died at West Orange,
manufacturing shop and small laboratory
New Jersey on October 18, 1931, he had
in Newark, New Jersey.
created over 1,000 inventions.”
Later he gave up manufacturing, and
moved his laboratory to Menlo Park, New
Jersey. At this laboratory, he directed
other inventors.
He is most <famed> for inventing the Future work that we aim for is as follows:
light bulb in 1879.
1. Using the lexical database for improving cohesion
among sentences while extraction. [4] This will
His first <designs> helped improve the
possibly improve the extraction of essays, etc.
telegraph, an early method for sending 2. Improving efficiency and speed of extraction and
messages over electric wires. abstraction.
3. Replacing words with better synonyms and including
At twenty-one, Edison produced his replacement of digrams, trigrams and phrases.
first major <design>, a stock ticker 4. Using NLP to improve abstraction.
for printing stock-exchange quotes.