You are on page 1of 4

Text Analytics

The Text Analytics nodes offer powerful text analytics capabilities, which use advanced linguistic technologies and Natural Language Processing
(NLP) to rapidly process a large variety of unstructured text data and, from this text, extract and organize the key concepts. Text Analytics can also
group these concepts into categories.

Around 80% of data held within an organization is in the form of text documents—for example, reports, web pages, e-mails, and call center notes.
Text is a key factor in enabling an organization to gain a better understanding of their customers' behavior. A system that incorporates NLP can
intelligently extract concepts, including compound phrases. Moreover, knowledge of the underlying language allows classification of terms into
related groups, such as products, organizations, or people, using meaning and context. As a result, you can quickly determine the relevance of the
information to your needs. These extracted concepts and categories can be combined with existing structured data, such as demographics, and
applied to modeling in Cloud Pak for Data to yield better and more-focused decisions.

Linguistic systems are knowledge sensitive—the more information contained in their dictionaries, the higher the quality of the results. Text
Analytics provides a set of linguistic resources, such as dictionaries for terms and synonyms, libraries, and templates. These nodes further allow
you to develop and refine these linguistic resources to your context. Fine-tuning of the linguistic resources is often an iterative process and is
necessary for accurate concept retrieval and categorization. Custom templates, libraries, and dictionaries for specific domains, such as CRM and
genomics, are also included.

Watch the following short video for an overview of Text Analytics.

This video provides a visual alternative to the content in this documentation.

  
https://video.ibm.com/embed/channel/23952663/video/spss-text-analytics

Applications
In general, anyone who routinely needs to review large volumes of documents to identify key elements for further exploration can benefit from
using Text Analytics. Examples of some specific applications include:

Scientific and medical research. Explore secondary research materials, such as patent reports, journal articles, and protocol publications.
Identify associations that were previously unknown (such as a doctor associated with a particular product), presenting avenues for further
exploration. Minimize the time spent in the drug discovery process. Use as an aid in genomics research.
Investment research. Review daily analyst reports, news articles, and company press releases to identify key strategy points or market shifts.
Trend analysis of such information reveals emerging issues or opportunities for a firm or industry over a period of time.
Fraud detection. Use in banking and health-care fraud to detect anomalies and discover red flags in large amounts of text.
Market research. Use in market research endeavors to identify key topics in open-ended survey responses.
Blog and Web feed analysis. Explore and build models using the key ideas found in news feeds, blogs, etc.
CRM. Build models using data from all customer touch points, such as e-mail, transactions, and surveys.

Nodes
Along with the many standard SPSS Modeler nodes in Cloud Pak for Data, you can also work with text mining nodes to incorporate the power
of text analysis into your flows. These nodes are available on the node palette, under Text Analytics:
The Language Identifier node is a process node that scans source text to determine which human language it's written in and then marks that
up in a new field. Primarily designed to be used with large amounts of data, this node is particularly useful when you have more than one
language in your data sources and want to process just one language.
The Text Mining node uses linguistic methods to extract key concepts from the text, allows you to create categories with these concepts and
other data, and offers the ability to identify relationships and associations between concepts based on known patterns (called text link analysis).
You can use this node to explore the text data contents or to produce either a concept model or category model. The concepts and categories
can be combined with existing structured data, such as demographics, and applied to modeling.
The Text Link Analysis node extracts concepts and also identifies relationships between concepts based on known patterns within the text.
You can use pattern extraction to discover relationships between your concepts, as well as any opinions or qualifiers attached to these concepts.
The Text Link Analysis (TLA) node offers a more direct way to identify and extract patterns from your text and then add the pattern results to the
dataset in the flow. But you can also perform TLA using an interactive workbench session in the Text Mining modeling node.

About text mining


Today, an increasing amount of information is being held in
unstructured and semistructured formats, such as customer e-
mails, call center notes, open-ended survey responses, news
feeds, web forms, etc. This abundance of information poses a
problem to many organizations that ask themselves, "How can
we collect, explore, and leverage this information?"
Reading in source text
You can use the Language Identifier node to identify the natural
language of a text field within your source data. The output of
this node is a derived field that contains the detected language
code.
Mining for text links
The Text Link Analysis (TLA) node adds pattern-matching
technology to text mining's concept extraction in order to
identify relationships between the concepts in the text data
based on known patterns. These relationships can describe
how a customer feels about a product, which companies are
doing business together, or even the relationships between
genes or pharmaceutical agents.
Mining for concepts and categories
The Text Mining node uses linguistic and frequency techniques
to extract key concepts from the text and create categories with
these concepts and other data. Use the node to explore the text
data contents or to produce either a concept model nugget or
category model nugget.

You might also like