Text Mining: Presented By: Prakhyath Rai Asst. Professor, Dept. of ISE, SCEM, Mangaluru

TEXT MINING
Presented By:
Prakhyath Rai
Asst. Professor, Dept. of ISE,
SCEM, Mangaluru
Outline
 Introduction
 Data Mining Vs. Text Mining
 Motivation for Text Mining
 I/O Model for Text Mining
 Steps for Text Mining
 Key Terms in Text Mining
 Text Mining Frameworks
 Merits of Text Mining
 Applications of Text Mining
 Demerits of Text Mining
 References
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007

Introduction
 Text Mining is a Discovery
 Text Mining is also referred as Text Data Mining (TDM)
and Knowledge Discovery in Textual Database (KDT).
 Text Mining is used to extract relevant information or

knowledge or pattern from different sources that are in
unstructured or semi-structured form.

Introduction Cont.
 Extract and discover knowledge hidden in text
automatically
 Aid domain experts by automatically:
 identifying concepts
extracting facts/relations
discovering implicit links
generating hypotheses
Data Mining vs. Text Mining
Data Mining Text Mining
Process directly Linguistic processing or natural

language processing (NLP)
Identify causal relationship Discover heretofore unknown

information
Structured Data Semi-structured & Unstructured

Data (Text)
Structured numeric transaction Applications deal with much
data residing in rational data more diverse and eclectic
warehouse collections of systems and
formats

Motivation for Text Mining
 Approximately 90% of the world’s data is held in
unstructured formats (source: Oracle Corporation)
 Information intensive business processes demand that we
transcend from simple document retrieval to “knowledge”
discovery.

Input-Output Model for Text Mining
Input Output
Patterns
Text Mining Connections
Technique
Trends
Documents

Steps for Text Mining
 Pre-Processing the Text
 Applying Text Mining Techniques

Summarization
Classification
Clustering
Visualization
Information Extraction
 Analyzing the Text

Keywords Terms in Text Mining
 Information Extraction (IE)  Artificial Intelligence (AI)
 The science of searching for  Artificial intelligence
 Information in documents (AI) is a branch of
 Documents themselves
computer science and
engineering that deals
 Metadata which describe
with intelligent behavior,
documents learning, and adaptation
 Text, sound, images or data, in machines.
within database: relational
stand-alone database or
hypertext networked
databases such as the
Internet or intranets.
Merits of Text Mining
Database limits itself to Storage of less Information
whereas Text Mining overcomes this limitation
Extraction of relevant Information and Relationships

from Natural Documents
Extraction of Information from Unstructured or Semi-

structured Documents

Applications of Text Mining
 Analysis of Market Trends

Classification Technique
Information Extraction Technique
 Analysis and Screening of Junk Emails

 Classification on the basis of pre-defined frequently
occurring items

Demerits of Text Mining
Requires Initial Learned Information System for

Initial Extraction
Suitable programs are not been defined to Analyze

Text from Mining Knowledge or Information

References
[1] R Baeza-Yates and B Ribeiro-Neto. “Modern Information Retrieval”, ACM
Press, New York, 1999.
[2] Ning Zhong, Yuefeng Li and T. Grance, “Effective Pattern Discovery for Text
Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 1,
January 2012.
[3] Raymond J Mooney and Un Yong Nahm, “ Text Mining with Information
Extraction”, Proceedings of the 4th International MIDP Colloquium, pages 141-
160, Van Schaik Pub., South Africa, 2005.
[4] M E Califf and R J Mooney, “Relational Learning of Pattern-Match Rules for

Information Extraction”, Proceedings of the 16th National Conference on Artificial
Intelligence (AAAI-99), pages 328-334, Orlando, FL, July 1999.
[5] D Freitag and N Kushmerick, “Boosted Wrapper Induction”, Proceedings of

the 17th National Conference on Artificial Intelligence (AAAI-2000), pages 577-
583, Austin, TX, July 2000.

Text Mining: Presented By: Prakhyath Rai Asst. Professor, Dept. of ISE, SCEM, Mangaluru

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Text Mining: Presented By: Prakhyath Rai Asst. Professor, Dept. of ISE, SCEM, Mangaluru

Uploaded by

Copyright:

Available Formats

TEXT MINING

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007

 Text Mining is a Discovery

 Text Mining is also referred as Text Data Mining (TDM)

and Knowledge Discovery in Textual Database (KDT).

 Text Mining is used to extract relevant information or

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007

 Aid domain experts by automatically:

discovering implicit links

Process directly Linguistic processing or natural

Identify causal relationship Discover heretofore unknown

Structured Data Semi-structured & Unstructured

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007

 Applying Text Mining Techniques

 Analyzing the Text

Extraction of relevant Information and Relationships

Extraction of Information from Unstructured or Semi-

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007

 Analysis of Market Trends

 Analysis and Screening of Junk Emails

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007

Requires Initial Learned Information System for

Suitable programs are not been defined to Analyze

Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007

[4] M E Califf and R J Mooney, “Relational Learning of Pattern-Match Rules for

[5] D Freitag and N Kushmerick, “Boosted Wrapper Induction”, Proceedings of

You might also like