You are on page 1of 4

ABSTRACT

Twitter has attracted millions of users to share and disseminate most up-to-date information,
resulting in large volumes of data produced everyday. However, many applications in
Information Retrieval (IR) and Natural Language Processing (NLP) suffer severely from the
noisy and short nature of tweets. In this paper, we propose a novel framework for tweet
segmentation in a batch mode, called HybridSeg. By splitting tweets into meaningful
segments, the semantic or context information is well preserved and easily extracted by the
downstream applications. HybridSeg finds the optimal segmentation of a tweet by
maximizing the sum of the stickiness scores of its candidate segments. The stickiness score
considers the probability of a segment being a phrase in English (i.e., global context) and the
probability of a segment being a phrase within the batch of tweets (i.e., local context). For the
latter, we propose and evaluate two models to derive local context by considering the
linguistic features and term-dependency in a batch of tweets, respectively. HybridSeg is also
designed to iteratively learn from confident segments as pseudo feedback. Experiments on
two tweet data sets show that tweet segmentation quality is significantly improved by
learning both global and local contexts compared with using global context alone. Through
analysis and comparison, we show that local linguistic features are more reliable for learning
local context compared with term-dependency. As an application, we show that high
accuracy is achieved in named entity recognition by applying segment-based part-of-speech
(POS) tagging.
LIST OF FIGURES:

SNO. FIGURE NAME OF THE FIGURE PAGE


NO. NO.

1 Figure (1.1) Example of tweet segmentation 2


2 Figure (3.1) Summaries Of SRS 10
3 Figure (5.1) Working of JAVA 14

4 Figure (5.1.1) The JAVA Programming 15

5 Figure (5.1.2) The java platform 16


6 Figure (5.1.3) Knowledge discovery process 18
7 Figure (5.3.1) JDBC Compiler 23
8 Figure (5.4.1) TCP/IP Stack 24
9 Figure (5.6.1) General J2ME Architecture 28
6 Figure ( 6.1) Modeling a system architecture using views of UML 31
7 Figure (6.2) UML Diagrams Types 32
8 Figure (6.2.1) Use-Case Diagram 33
9 Figure (6.2.2) Sequence Diagram 35
11 Figure (6.2.3) Class Diagram 36
12 Figure (6.2.4) Flow Chart Diagram 37
13 Figure (6.2.5) Data Flow Diagram 39
LIST OF ABBREVATIONS:
IR Information Retrieval
NLP Natural Language Process
POS Part-Of-Speech
HMM Hidden Markov Model
CRF Conditional Random Filed
EL Entity Linking
RW Random Walk
NER Named Entity Recognition

You might also like