You are on page 1of 4

FACT SHEET

SAS® Text Miner


Automate discovery and insights from document collections

Organizations collect huge amounts of


Key Benefits
What does SAS® Text Miner do? text-based information daily, in many
SAS Text Miner discovers information languages and dialects. Customer • Reduce decision time through
buried in collections of text. By auto­ feedback, emails, Web documents, automated processes. By imple-
matically reading text data and delivering blogs, Twitter feeds, warranty claims, menting intelligent algorithms and
algorithms for rigorous, advanced
surveys, research studies, client notes, natural language processing tech-
analyses, the solution makes it possible
competitive intelligence … the list goes niques, time-consuming manual
to grasp future trends and act on new
opportunities more precisely and with on. No one has time to read it all, much activities – such as cluster identification
less risk. It includes advanced linguistic less classify the content into common or topic building – can be generated
capabilities within the core data mining themes, or make sense of the essential automatically and executed consis­
solution of SAS® Enterprise Miner™ so tently and efficiently.
information.
you can easily extend text insights into
structured data mining and predictive • Enhance the discovery process
analysis. To observe trends, spot new topics, is- with subject-matter expertise.
sue alerts about potential problems and Through a unique, data-driven
Why is SAS® Text Miner important? flag new business indicators, you must method of identifying key concepts,
The software saves money and resources be able to analyze all your data before you can use interactive GUIs to
by automating the time-consuming tasks acting on it. But conver­sational lan- modify rel­evance scores and guide
of reading and comprehending electronic guage is ambiguous, and key messag- machine learning results with human
text. By consolidating structured (quan­
es buried in text data are not easy to insight. Extend text mining efforts
titative) data sources with text-based
(unstructured) information in a common discern or process. Most organizations beyond start and stop lists – and
environment, you gain a more accurate, are unable to combine text content what the software automatically
complete view of your data. Analysis with structured data in decision-making discovers – using custom entities
performed using both types of data pro­ contexts. and active learning.
duces descriptive and predictive models
that enable you to spot opportunities and • Present a high-level view of data
accurately recognize trends, leading to With SAS Text Miner you can analyze
with transparent drill-down capa­
fact-based, prioritized actions. legacy data stored from your system
bility. SAS Text Miner offers a visual
records – and dynamically reach out­
presentation of the entire data mining
For whom is SAS® Text Miner side to retrieve pertinent, fresh content
intended? process and allows users to drill to
from the Web. Interactively explore
relevant details, illustrating and explor­
SAS Text Miner is designed primarily for and automatically identify topics from
business analysts, modelers and data ing the term connections. An interactive
data-driven categories, and find explicit
scientists who must review large volumes interface lets you investigate derived
relationships and associations between
of text to extract common topics, ideas topics and fine-tune models.
and trends. It is applicable across all terms. Use an all-in-one, drag-and-drop
• Recognize trends and spot busi­
industries – and for organizations with interface to mine text data with struc-
voluminous collections, it includes high- ness opportunities. SAS Text Miner
tured data, then apply derived analytic
performance procedures for text mining, structures text into numeric represen­
insights directly into existing scoring
to speed the time to results. tations that can surmmarize the
systems. Use the solution to match
collection and become insightful
résumés with open positions, predict
inputs to a full range of predictive and
patient treatment outcomes with dif-
data mining modeling techniques.
ferent medication regimens, identify
In turn, you can better understand
factors that motivate buyers
customer, service and product needs –
to act – and much more.
and predict opportunities for timely
exploitation.
it can be included in advanced analyt­ics, can use the software to classify texts
Solution Overview
such as predictive analysis, data mining based on the detailed terms table and
SAS Text Miner provides a rich suite of and forecasting. then generate the resultant Boolean
linguistic and analytical modeling tools logic of the classifications. Use the rules
specifically developed for discover­ High-performance procedures are to categorize documents based on rule
ing and extracting knowledge from included to take advantage of multicore matches or export them – including the
collections of text content. Across processing hardware that expedites generated AND, OR and NOT opera-
electronic text snippets, document compute-intensive text processing tasks. tors – for deployment in SAS Enterprise
archives and Web downloads, patterns This version also includes insightful Content Categorization.
can be automatically identified as topics reports describing the results from the
and themes, defining explicit associa­ rule generator node, providing clarity to The text rule builder node enables
tions between terms and phrases. The model training and validation results. semisupervised, active learning.
software provides supervised, unsuper­ Through this feature, users can interact
vised and semisupervised methods to Automatically create Boolean rules and dynamically with the algorithm. The
discover previously unknown patterns interactively train models software automatically learns catego­
in document collections. It structures The text rule builder node automatically ries and topics from the collection.
data in a numeric representation so that generates an ordered set of rules. You Users can then guide the system to an
improved solution – enabling interac­
tive model building. The combination of
built-in software guidance and human
subject-matter expertise results in
highly refined models.

Faster processing
Two procedures, HPTMINE and
HPTMSCORE, execute multithreaded
text processing on an enabled SAS

Figure 1: You can generate Boolean


rules automatically with the rule genera-
tor node and actively teach the model
directly by overriding the software-
derived results.

Figure 2: Examine terms driving topic membership in an interactive GUI and, if similar,
merge and reassign terms and/or topics to produce desired results.
server. Specific to nondistributed envi-
Key Features
ronments, they take advantage
of multiple core processors – leading Automatic Boolean rule generation makes it easy to classify content
• Lets you describe and predict a target variable based on the detailed terms. Resulting rules can
to compute-intensive processing gains
be used to categorize documents based on rule matches.
in many cases.
• Allows rules to also be exported as Boolean rules – and used as a starter rule set for SAS
Enterprise Content Categorization.
Integrated document filtering
• Includes output to compare rules between training and validation data.
Sophisticated dimension reduction • Enables active learning by:
techniques enable advanced filtering • Providing automated, machine-generated suggestions of categories and topics that can be
through weighting, integrated spell recharacterized by the user.
checking and transformation of qualita­ • Modifying the target assigned to the rules, and when rules are regenerated based on these
tive data into compact formats. Such user-defined modifications, the model is updated.
techniques can structure the unstruc­ User-friendly, flexible interface
tured data so that text-based insights • Merge topics together into one user topic for simplifying similar results.
are easily included in both predictive • Use topic displays to show document terms/all terms, highlighting why a document was as­
and descriptive structured data mining signed to a particular topic.
efforts. • Use view mode to illustrate just the terms in a single document or within a topic, or to sort text
documents.
Interactive interface for text import • Obtain document-level sentiment insights with an AFFIN sentiment list available as a sample
defines terms for Web data or text files data set with more than 2,000 terms and preassigned polarity weights.
• Modify, save and share process-flow diagrams of text mining analysis.
With SAS Text Miner, you can use the
• Add tables (from previous efforts) to nodes, to extend usability of prior knowledge.
text import node to create data sets • Extend text nodes further by customizing algorithms or declaring new user-written business
dynamically from files contained in a rules for predictive modeling, clustering, visualization and reporting – deployable as SAS score
directory or from the Web. The software code.
reads text stored in a wide variety of • Conforms to accessibility standards for the Windows platform. Accessibility features relate
document formats, accepting potential- to standards for electronic information technology that were adopted by the US government
ly proprietary formats such as Microsoft under Section 508 of the US Rehabilitation Act of 1973.
Word and PDF inputs. The text import Integrated document filtering
node converts the data and also filters • Employ sophisticated dimension reduction techniques that enable advanced filtering through
or extracts the text from the files – and weighting, integrated spell checking and transformation of qualitative data into compact
references the data to a SAS data set. formats.
If a URL is specified, this node will also • Create synonym data sets and import previously defined synonyms into the text filter node to
crawl associated websites and retrieve improve reusability of existing assets.
the files, bringing them to the common Visual analysis of results
directory before filtering. This gener- • Use the concept link diagram to analyze results visually and to effectively explore the relation­
ates a data set that can be used by the ships between terms.
text parsing node for the next stage of • Use interactive diagrams to communicate results to key stakeholders:
your analysis. In addition to filtering, the • Employ diagrams that cluster results, derive topic assessments and link associations among
text import node can also identify each terms.
document’s language and transcode • Use success graph and document rules table to explore generated Boolean rules.
the document to the session encoding Take advantage of compute power with high-performance processing
format. • Address compute-intensive text processing tasks, reducing time to results.
• Transform text to structured representations of the collection with SVDs.
More control of table import • Score large data faster.
Aspects of SAS Text Miner have been Choose predefined entities, define your own, or create custom entities for fact and
further enhanced for node performance event extraction
and results. Importing table information • Define your own multiword terms (phrases such as “drag and drop”).
in dialog boxes is better controlled with • Choose from one of 18 prespecified entities definitions for address, company, date, phone
Add Table and Replace Table options. number, SSN, time and others to ensure extraction from input content.
• Create your own custom entities to be extracted from text inputs, including a list of predefined
entities (such as defined districts or product codes) using the SAS Concept Creation for SAS
Text Miner add-on.
High-performance text mining
Key Features (continued)
High-performance text mining is enabled
Interactive interface for importing text from the Web or internal file systems
to run in a multithreaded mode on a • Lets you dynamically create data sets from files contained in a directory or crawled from the
single server using programmable Web.
procedures or a node in the interactive • Gives access to numerous forms of textual data, including PDFs, Microsoft Word, extended
GUI. Designed for big text data, these ASCII text, HTML, Microsoft Office formats, spreadsheets, presentations, email and database
capabilities will grow with your collections, formats.
extending to distributed hardware en- • Extracts, transforms and loads textual data into a SAS data set for mining.
vironments (licensed separately). High • Accepts even potentially proprietary formats, converts the formats, and filters or extracts the
performance lets you develop models text from the files, placing a copy in a plain file and referencing the data to SAS.
that examine millions and tens of millions • Identifies each document’s language and transcodes it to the session encoding format.
of documents, and run the models in Natively supports multiple languages
min­utes or seconds. For more informa- • Supports Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek,
tion, visit sas.com/hptextmining. Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Ro-
manian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, and Vietnamese. Includes dialects for
Simplified and Traditional Chinese, Parisian and Canadian French, Old and New World German,
Nynorsk and Bokmål Norwegian, Portuguese for Portugal and Brazil, and Spanish for South
SAS® Text Miner System America and Spain.
Requirements
To learn more about SAS Text Miner
system requirements, download
white papers, view screenshots and
see other related material, please
visit sas.com/textminer.

SAS Institute Inc. World Headquarters   +1 919 677 8000


To contact your local SAS office, please visit: sas.com/offices
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA
and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Copyright © 2013, SAS Institute Inc. All rights reserved. 101371_S109736.0713

You might also like