You are on page 1of 11

Mapping twitter specific words to

WordNet

Presented By :
Arushi Gaur (15UCS026)
Sonal Jain (15UCS054)
Sakshi Sachar (15UCS117)
Priyanka Sharma (15UCS170)
The Problem
In today's world of digital communication where people use various social media
platforms like twitter for fast and frequent conversations, they use various
abbreviations and phrases. These words don’t exist in any dictionary and it is very
inconvenient for novice people to search and understand the meanings of such words.

Also,there is not a common platform or dictionary existing which contains the


meanings or context of all such abbreviations and phrases.
Related Literature
Over the years, many people have contributed to the development of WordNet. The
project began in the Princeton University Department of Psychology, and is
currently housed in the Department of Computer Science.

There is a set of research papers called "Five Papers" which present the
development and the implementation of WordNet.We have referred the following
research papers for our analysis which helped us in understanding the evolution and
the implementation of the WordNet.

● Introduction to WordNet: An On-line Lexical Database


● Nouns in WordNet: A Lexical Inheritance System
● Adjectives in WordNet
● English Verbs as a Semantic Net
● Design and Implementation of the WordNet Lexical Database and Searching
Software
Planned Solution
1) Extraction of conversational tweets so as to get these abbreviations and phrases
and analyse them.
2) Extracting the words from tweets and checking in WordNet, if it doesn’t exist in
it, checking in some standard dictionary and seeing if they exist in it or not. We
used Oxford Dictionary for the same.
3) If the word exists in the standard dictionary, getting its information from that
dictionary and mapping to the corresponding WordNet synset.
4) Storing the words not existing in the standard dictionary into a file.
5) Manually invoking user for the words in the file and asking whether he wants to
add to WordNet or not. If yes, asking him whether as an abbreviation or
word.If word, ask him to enter the meaning,parts of speech etc. and mapping to
corresponding synset of WordNet else if abbreviation,asking for full
form,context and creating a synset separately for these types of words and
mapping to WordNet.
Progress Made in this semester
1. Firstly we extracted the conversational tweets from Microsoft chat research
where we received tweet ids of conversational chats, extracted the chat tweets,
compiled the tweets removing the Retweets, links and other stopwords.
2. We collected all the words in the tweets fetched in a CSV file.
3. We iterated through all the words in CSV file and verified if it exists in
WordNet, if not then we stored it in another CSV file. This was our first filter to
find out the words that doesn’t exists in WordNet.
4. Next step was to find the meaning of words which didn’t exist in WordNet but
were found in some other standard dictionaries like Oxford, Pydictionary and
others.
5. After filtering those words, remained the words which didn’t exist in any of the
dictionary . These words were either short forms of words, abbreviations,
phrases or grammatically wrong words.
6. Thus after applying two level filters on the bag of words we got from fetched
conversational tweets, we were left with two types of words :

● The words which existed in standard dictionaries but not in WordNet.


● The words which didn’t existed in any of the dictionaries and WordNet.
Different set of words
1. Phrases
a. Ttyl- talk to you later
b. IMAO - In my arrogant opinion
c. SMH -Shaking my head
d. Idk- I don't know
e. btw : by the way
f. omg : oh my god
g. gtwt : girl today women tomorrow
2. Words showing expressions
a. Awwwwwwwww
b. Hahaahaa
c. Ohhhh
d. Ermmm
e. Ahahahah
f. heyy
3. Short forms/ Abbreviations

A. Lyk- like
B. Talkig- talking
C. Havin- having
Issues Faced
1) Searching Dataset : We were finding it difficult to find tweets in which these
abbreviations and phrases were used. Mostly were short forms of the words , for
eg. words formed by removing vowels, but there were very less abbreviations or
phrases. We then found conversational dataset made public by Microsoft and
analysed it.
2) WordNet has limited words. For eg. it does not include pronouns, interrogative
words, determiners, conjunctions and prepositions. So while checking in
WordNet, common words like what,when,the etc. were not existing.
3) Context: Difference in the context of the words . For eg : ‘I’ used as personal
pronoun in a tweet but exists as Iodine in WordNet.
Plan for next Semester
Our plan for next Semester is:

1. We will map the first category of words, that exists in standard dictionaries to
the wordnet by taking their meaning from oxford and mapping them to the
specific synset they belong like noun, adjectives, adverbs and verbs. And for
special category of words which doesn’t exist in wordnet like determiners,
prepositions, pronouns, conjunctions, and articles we will add a category in
wordnet and map those words in wordnet accordingly.
2. For second type of words, which are actually abbreviations, phrases and short
forms(like ttyl which means talk to you later) which don’t exists in any
dictionary with well defined meanings are to be mapped to wordnet by entering
their meanings manually and making a special category for phrases and short
forms.

You might also like