You are on page 1of 15

TEXT MINING

Presented By:
Prakhyath Rai
Asst. Professor, Dept. of ISE,
SCEM, Mangaluru
Outline
 Introduction
 Data Mining Vs. Text Mining
 Motivation for Text Mining
 I/O Model for Text Mining
 Steps for Text Mining
 Key Terms in Text Mining
 Text Mining Frameworks
 Merits of Text Mining
 Applications of Text Mining
 Demerits of Text Mining
 References

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007


Introduction

 Text Mining is a Discovery

 Text Mining is also referred as Text Data Mining (TDM)

and Knowledge Discovery in Textual Database (KDT).

 Text Mining is used to extract relevant information or


knowledge or pattern from different sources that are in
unstructured or semi-structured form.

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007


Introduction Cont.
 Extract and discover knowledge hidden in text
automatically

 Aid domain experts by automatically:

 identifying concepts

extracting facts/relations

discovering implicit links

generating hypotheses
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Data Mining vs. Text Mining
Data Mining Text Mining

Process directly Linguistic processing or natural


language processing (NLP)

Identify causal relationship Discover heretofore unknown


information

Structured Data Semi-structured & Unstructured


Data (Text)
Structured numeric transaction Applications deal with much
data residing in rational data more diverse and eclectic
warehouse collections of systems and
formats

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007


Motivation for Text Mining
 Approximately 90% of the world’s data is held in
unstructured formats (source: Oracle Corporation)
 Information intensive business processes demand that we
transcend from simple document retrieval to “knowledge”
discovery.

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007


Input-Output Model for Text Mining

Input Output

Patterns
Text Mining Connections
Technique
Trends

Documents

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007


Steps for Text Mining
 Pre-Processing the Text

 Applying Text Mining Techniques


Summarization
Classification
Clustering
Visualization
Information Extraction

 Analyzing the Text


Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Keywords Terms in Text Mining
 Information Extraction (IE)  Artificial Intelligence (AI)
 The science of searching for  Artificial intelligence
 Information in documents (AI) is a branch of
 Documents themselves
computer science and
engineering that deals
 Metadata which describe
with intelligent behavior,
documents learning, and adaptation
 Text, sound, images or data, in machines.
within database: relational
stand-alone database or
hypertext networked
databases such as the
Internet or intranets.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Merits of Text Mining
Database limits itself to Storage of less Information
whereas Text Mining overcomes this limitation

Extraction of relevant Information and Relationships


from Natural Documents

Extraction of Information from Unstructured or Semi-


structured Documents

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007


Applications of Text Mining

 Analysis of Market Trends


Classification Technique
Information Extraction Technique

 Analysis and Screening of Junk Emails


 Classification on the basis of pre-defined frequently
occurring items

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007


Demerits of Text Mining

Requires Initial Learned Information System for


Initial Extraction

Suitable programs are not been defined to Analyze


Text from Mining Knowledge or Information

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007


References
[1] R Baeza-Yates and B Ribeiro-Neto. “Modern Information Retrieval”, ACM
Press, New York, 1999.

[2] Ning Zhong, Yuefeng Li and T. Grance, “Effective Pattern Discovery for Text
Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 1,
January 2012.

[3] Raymond J Mooney and Un Yong Nahm, “ Text Mining with Information
Extraction”, Proceedings of the 4th International MIDP Colloquium, pages 141-
160, Van Schaik Pub., South Africa, 2005.

[4] M E Califf and R J Mooney, “Relational Learning of Pattern-Match Rules for


Information Extraction”, Proceedings of the 16th National Conference on Artificial
Intelligence (AAAI-99), pages 328-334, Orlando, FL, July 1999.

[5] D Freitag and N Kushmerick, “Boosted Wrapper Induction”, Proceedings of


the 17th National Conference on Artificial Intelligence (AAAI-2000), pages 577-
583, Austin, TX, July 2000.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007

You might also like