You are on page 1of 13

Te c h n i c a l r e p o r t

Clementine ®

Solution
Publisher

®
Clementine® Solution Publisher

Introduction and overview


Clementine® puts enterprise-strength data mining in the hands of business users, enabling
them to build powerful models using Clementine’s stream approach to data mining. With
Clementine Solution Publisher, users deliver data mining solutions —including accessing
data, transforming and cleaning data, applying models and writing out results — throughout
the organization to improve how business is done. Solutions are delivered in a variety of ways,
including:
■ Delivery of predictive models to decision makers, such as marketing managers, who use
predictions to select lists for campaigns, and make other decisions
■ Segmentation of customers by scoring a customer database. These scores are then used
at various customer touchpoints to drive personalized interactions.
■ Scoring real-time data by embedding the data mining solution within a Web site or call
center software

Figure 1: A Clementine stream is a collection of icons or nodes that represents a data mining solution. This
example shows two circular data access nodes on the left. The data contained in these nodes flow into processing
steps, including data source merging. The data is then fed into models that write scores to the data before
passing data to other processing steps. The last node is the Clementine Solution Publisher node, which exports
all of this functionality as a standalone program. Most approaches deploy just a model. With Clementine
Solution Publisher, all of the steps are deployed.

Managing the costs of deployment


Clementine Solution Publisher aims to minimize the cost of deploying and maintaining data
mining solutions. It does this by exporting functionality from Clementine to be used outside
of Clementine itself. Because Clementine Solution Publisher exports all processing steps, your
programmers don’t have to recode them by hand in order to deploy a data mining solution.
Clementine Solution Publisher’s ability to export all processing steps means your organization
will realize significant time and cost savings for deployment over the long run. How much
money does your organization spend when programmers hand code data mining steps? How
many lines of bug-free code per day can your programmers code? How much do you spend on
your programmers’ personnel costs, including overhead and QA costs? Keep in mind that a

Technical report 2
Clementine® Solution Publisher

typical data mining application


has approximately 3,000 lines of
code and requires frequent
updates to 20 percent of the
code. How much money will
your organization spend on a
data mining application that
takes one year to create and
requires republishing every
six months? Now imagine how
much you could save if you
Figure 2: Data mining is an ongoing activity. After five years, the could make the programming
costs associated with recoding applications by hand increase greatly. process more efficient.
Let’s take a look at a fictional organization, XYZ Co., which plans to implement a data mining
application that its programmers will recode by hand. XYZ Co.’s programmers can code 10
lines of bug-free code per day. For 220 days of work, XYZ Co. spends $80,000 for programming
personnel costs, or about $450 per day. Its data mining project took 300 programming days and
cost $135,000 for personnel. Updates represent $27,000 for 60 programming days. If XYZ Co.
republishes these solutions every six months, its costs over the long term skyrocket (Figure 2).
While your assumptions may change based on your business situation, the costs of maintaining
a data mining solution are high nevertheless. Also, there are soft costs, such as the delay in
delivering the solution, which will adversely affect the return on your investment. By choosing
the right deployment strategy, you put your data mining results to work quickly to achieve
your strategic goals.

Programming activity XYZ Co. Your organization


when recoding by hand

Lines of bug-free code pro- 10 lines _______ lines


grammed per day

Lines of code in a typical 3,000 lines 3,000 lines


data mining application

Programmers’ personnel $450 $_______


costs per day

Application creation $135,000 $_______


(300 programming days)

Updates $27,000 $_______


(60 programming days)
Figure 3: This chart shows XYZ Co.’s costs when recoding by hand. The company will save a significant
amount of time and money when it eliminates many hand-recoding steps. How much time and money can
your organization save if your programmers aren’t recoding large numbers of lines by hand?

Technical report 3
Clementine® Solution Publisher

Clementine Solution Publisher architecture


Clementine Solution Publisher (see Appendix A for system requirements) is deployment
technology that enables organizations to use Clementine streams independent of the stan-
dard Clementine environment. Clementine Solution Publisher is comprised of two distinct
functions: publishing and executing.
Publishing means exporting a set of files that describe all the functionality in a stream,
including accessing data, transforming and cleaning data, applying models and writing
out results. Publishing is done by placing and connecting the Publisher node, which is found
in the Output palette of Clementine. The results of publishing are an image file (.pim) and
a parameter file (.par), which encapsulate all of the functions of a stream.
Executing means using the exported files on a target machine. To execute these files, the
target machine must have the Clementine Runtime environment installed. This environment
is shipped on a separate CD-ROM, which is part of the Clementine Solution Publisher kit.
Executing the image file on a machine with the Clementine Runtime environment will enable
it to perform all the functions in the exported Clementine stream. And, just like Clementine,
Clementine Solution Publisher pushes common data procedures back to the database for
more efficient processing.

Image and parameter files


Publishing a stream creates an image file and a parameter file. The image file contains
a description of the stream, including the data access, data transformation and cleaning
and results. The parameter file allows the user to customize the execution to some degree,
in particular, to change the input and output data sources.
When an image is first published, the parameter file duplicates settings from the stream.
The parameter file needs to be modified only when users change values within it to point
to a different input or output data source.
Both the image and the parameter files are plain text files encoded in the character set
of the locale in which they were published. Since the parameter file is meant to be edited,
it has a simpler, more intelligible structure.

Clementine Solution Publisher Runtime system


The runtime system consists of an executable program and some supporting files. The
entire system must be installed on a machine before a published image can be executed.
Once in place, the same system can be used to run any image published with the same
version number.
To execute the published solution, your application needs to run command-line arguments.
The runtime system will only restore images in the character set of the locale in which
it is run. The runtime system supports all platforms supported by Clementine, including
Windows, Solaris, HP-UX, AIX and the IBM iSeries (AS400).
The executable program is called clemrun, for “Clementine Solution Publisher Runtime.”
It is run from the command-line with the following synopsis:
clemrun [-o options] [-p parameter-file] image-file

Technical report 4
Clementine® Solution Publisher

The command-line arguments are as follows:

-o options options is a comma-separated list of option settings of the


form name=value as might be passed to Clementine Server.
Options set this way would normally be server-global
options; session options are read from the image file and
parameter file.

-p parameter-file parameter-file is the name of a parameter file to use with


the image. Settings in the parameter file override those in
the image.
image-file The name of an image file to restore and execute.

The executable program interprets the command-line options, loads the image file, reads the
parameter file settings and executes the published stream. Any errors or log messages are
written to the standard error output.
On Unix, the clemrun program is a shell script, which runs a binary executable within an
appropriate environment. The executable is called:
clemrun_exe

Choosing export options for Clementine Solution Publisher


The user interface (UI) for Clementine Solution Publisher is comprised of a node on the
Output palette that has a dialogue and sub-dialogues. The dialogue and sub-dialogues give
options relating to the format of the output.
The main dialogue has controls for
changing the file name stem and
directory for the published stream.
It also has a control for specifying
the format of the output — data-
base or file.

Supported and
unsupported features
Clementine Solution Publisher
supports most of the features of
Clementine streams. Some excep-
tions include CEMI models and
terminal nodes, such as model-
building nodes, visualization nodes
and output nodes. Note however
that some output functions, like
scoring a database, are handled Figure 4: This dialogue box will be displayed when the "flat file"
by the Publisher node itself. option is selected in the Publisher node. The dialogue enables
users to specify how data will be written when the published
stream is executed outside the Clementine user interface.

Technical report 5
Clementine® Solution Publisher

Start deploying your results


The key to successful data mining is sharing results with others in your organization — those
who can use this information to make positive, profitable changes and achieve your target ROI
(return on investment). Clementine Solution Publisher gives you the means to deliver timely
and cost-effective data mining solutions throughout your organization.
Clementine Solution Publisher empowers you to implement deployment plans that make a
difference. Here are just a few ways strategic deployment improves your data mining ROI:
■ Customer relationship management (CRM): deliver models that predict who your best
customers are, what they value and how likely they are to respond to your offers
■ eCRM: deploy tested models that enable you to provide custom content to your Web
visitors
■ Accurate forecasting: use consistent, accurate models throughout your organization
to ensure better planning
■ Fraud detection: easily provide new models constantly needed by your organization
to keep up with the ever-changing face of fraud
■ Risk analysis: effectively manage risk when you make models available throughout
your organization

In addition, Clementine Solution Publisher enables you to deploy results to a variety


of people and systems, including:
■ Decision makers: give people the information they need to look ahead and make
decisions based on facts
■ “Virtual” decision makers, such as your Web site: deliver the right information to your
Web visitors as they surf your site
■ Operational systems: predict errors in a process before they occur
■ Databases: ensure information stored in a centralized data warehouse is available
at all touchpoints

Deployment doesn’t end once information has reached its destination — the feedback loop
must come full circle so models are improved. Model builders need feedback from data mining
projects to update and improve models as changes — such as customer behavior or your
organization’s capabilities — occur.
Whether it’s reducing fraud, retaining profitable customers, targeting better prospects or
managing risk, using Clementine Solution Publisher when data mining ensures you’ll get
the right information in the hands of the right people and systems.

Technical report 6
Clementine® Solution Publisher

Appendix A: Clementine Solution Publisher Runtime


system requirements
■ Hardware: Pentium-compatible or higher processor for Microsoft Windows; IBM iSeries
or IBM RS/6000 for AIX; SPARC for Solaris; HP Workstation for HP-UX. A CD-ROM drive
is required for installation.
■ Operating system: Microsoft Windows 2000 or NT 4.0 with Service Pack 3 or higher;
IBM OS/400 Version 5 Release 1 (V5R1, 5722-SS1), OS/400 Portable Application Solution
Environment (PASE, 5722-SS1 Option 33) and OS/400 QSYSINC (5722-SS1 Option 13);
Solaris 2.6, 7 or 8; HP-UX 11.0 or 11i; AIX 4.3 or 5.1.
■ Minimum free drive space: 4MB for installation; plus at least twice the drive space
of the amount of data to be processed.
■ Minimum RAM: 256MB

Appendix B: Embedded Solution Publisher


The Clementine Solution Publisher has a runtime interface (CLEMRTL) that allows the
published solutions to be embedded into other programs. Developers can call CLEMRTL
procedures in client programs written in C, C++ and other programming languages.
This appendix outlines the steps for developing an application using the CLEMRTL library
procedures. It also contains a description of each procedure.

Coding your program


Any source file that references DLL procedures must include the header clemrtl.h. The header
file provides ANSI C prototypes for the DLL procedures and defines useful macros; no other
headers are required except for those that your program requires. To protect against name
clashes, all DLL function names start with CLEMRTL and all macro names are prefixed with
CLEMRTL_.

Messages
Reports contain useful information that should be communicated back to the application in
some way. The CLEMRTL provides a number of mechanisms for dealing with such reports:
1. Set a log file
2. Get details of the last error
3. Set a report handler
The runtime system has a localized message catalogue for reports; any report passed to the
application would include the localized message string but also the report code so that the
application could choose to interpret and present the message differently. In order to get
detailed (localized) messages, the application needs to store the appropriate messages.cfg
file in a config directory.

Environment
The runtime system needs no special environment or registry settings to operate correctly.
Dependent libraries must be distributed with an application linked to the CLEMRTL. These
files are included on the Clementine Solution Publisher Runtime CD-ROM in the REDISTRIB
directory.

Technical report 7
Clementine® Solution Publisher

API procedures
Some general points:
■ Clemrtl_initialize must be called before any other functions in the library are used
■ An application might want to initialize the random number generator (RNG) to specify
some explicit value. Or, an application might want to initialize the RNG to some explicit
value to ensure a consistent sequence of random numbers for its own use. It is necessary to
initialize the runtime library before the RNG. An application should also be aware that call-
ing any runtime library function is likely to generate one or more random numbers, which will
disrupt the sequence.
■ The API has C-linkage for maximum compatibility, but the libraries will still have C++
dependencies, which on some platforms might mean they can only be used with a C++
aware linker
■ The runtime system is not itself multi-threaded, i.e., all API calls — except Interrupt —
must be made from the same thread, but is safe for use in a multi-threaded environment
■ Every function returns a status indicator. The status indicator values are:
– CLEMRTL_OK - Success
– CLEMRTL_FAIL - Failed — no further details available
– CLEMRTL_ERROR - Failed — additional information about the error — the
server report code, severity and message text — will be
available through the Get Error Detail call
■ Multiple images can be opened simultaneously, and each image can be executed multiple
times. However, only one image can be executed at a time because of single threading.

Initialize

int clemrtl_initialize();

Description
Initializes the Publisher runtime library.

Returns
One of the following codes:
Error code Description
CLEMRTL_OK No error
CLEMRTL_FAIL Function failed

Technical report 8
Clementine® Solution Publisher

Get option

int clemrtl_getOption(const char *optionName ,char *optionValue, int bufflen);

Description
Retrieves the value of a global option.
Parameter Description
optionName String containing the option name
optionValue String to receive the option value
bufflen The length of the buffer passed to receive the option value

Returns
One of the following codes:
Error code Description
CLEMRTL_OK No error
CLEMRTL_ERROR An error occurred and more details are available

Set option

int clemrtl_setOption(const char *optionName, const char *optionValue);

Description
Updates the value of a global option.
Parameter Description
optionName String containing the option name
optionValue String containing the option value

Returns
One of the following codes:
Error code Description
CLEMRTL_OK No error
CLEMRTL_ERROR An error occurred and more details are available

Set log file

int clemrtl_setLogFile(int logType, const char *file);

Description
Redirects log messages from all subsequently opened images.
Parameter Description
LogType Type of log, must be one of the following values:
CLEMRTL_STDERROR_LOG — send messages to stderr
CLEMRTL_NULL_LOG — suppress messages
CLEMRTL_FILE_LOG — send messages to specified file
file String containing the file name

Returns
One of the following codes:
Error code Description
CLEMRTL_OK No error
CLEMRTL_FAIL Function failed

Technical report 9
Clementine® Solution Publisher

Open image

int clemrtl_openImage
(const char* paramFile, const char* imageFile, unsigned int*handle);

Description
Load a published image.
Parameter Description
paramFile String containing the parameter filename
imageFile String containing the imagefilename
handle The handle to the image

Returns
One of the following codes:
Error code Description
CLEMRTL_OK No error
CLEMRTL_ERROR An error occurred and more details are available

Close image

int clemrtl_closeImage(unsigned int handle);

Description
Close an image handle and free its resources.
Parameter Description
handle The handle to the image

Returns
One of the following codes:
Error code Description
CLEMRTL_OK No error
CLEMRTL_FAIL Function failed

Enumerate parameters

int clemrtl_enumerateParameters
(unsigned int handle, void (*callback_ proc)(const char *,const char *) );

Description
Applies the call-back procedure to each image parameter (name and value).
Parameter Description
handle The handle to the image
void (*callback_proc) Enumeration call-back procedure
(const char *,const char *)

Returns
One of the following codes:
Error code Description
CLEMRTL_OK No error
CLEMRTL_ERROR An error occurred and more details are available

Technical report 10
Clementine® Solution Publisher

Get parameter

int clemrtl_getParameter
(unsigned int handle, const char *ParamName, char *ParamValue, int bufflen);

Description
Retrieves the value of an image parameter.
Parameter Description
handle The handle to the image
paramName String containing the parameter name
paramValue String to receive the parameter value
bufflen The length of the buffer receiving the parameter value

Returns
One of the following codes:
Error code Description
CLEMRTL_OK No error
CLEMRTL_ERROR An error occurred and more details are available

Set parameter

int clemrtl_setParameter
(unsigned int handle, const char *ParamName, const char *ParamValue);

Description
Updates the value of an image parameter.
Parameter Description
handle The handle to the image
paramName String containing the parameter name
paramValue String containing the parameter value

Returns
One of the following codes:
Error code Description
CLEMRTL_OK No error
CLEMRTL_ERROR An error occurred and more details are available

Technical report 11
Clementine® Solution Publisher

Set report handler

int clemrtl_setReportHandler
(unsigned int handle, void (*handler)(char, int, const char *) );

Description
Installs a report handler for an image. The call-back procedure is called for each information-
al or diagnostic report and receives the report code, the severity indicator and a text mes-
sage.
Parameter Description
handle The handle to the image
void (*handler) Enumeration call-back procedure
(char, int, const char *)

Returns
One of the following codes:
Error code Description
CLEMRTL_OK No error
CLEMRTL_ERROR An error occurred and more details are available

Execute image

int clemrtl_execute(unsigned int handle);

Description
Executes an image using the current parameter settings.
Parameter Description
handle The handle to the image

Returns
One of the following codes:
Error code Description
CLEMRTL_OK No error
CLEMRTL_ERROR An error occurred and more details are available

Get error detail

int clemrtl_getErrorDetail
(unsigned int handle, int *code, char *severity, char *text, int bufflen);

Description
Retrieves detailed information about the last error.
Parameter Description
Handle The handle to the image or 0 (e.g., Error on Open Image)
Code Error code number

Technical report 12
Clementine® Solution Publisher

Severity indicator :

I Information Message
W Warning
E Error Message
X System Error
Text Error message Text string
bufflen Length of the error message text string

Returns
One of the following codes:
Error code Description
CLEMRTL_OK No error
CLEMRTL_FAIL Function failed
CLEMRTL_ERROR An error occurred and more details are available

Interrupt

void clemrtl_interrupt();

Description
Terminates any outstanding image execution. This function alone is safe to be called from a
signal handler or a separate thread. It sets a flag in the library, which causes any active call
to the execute function to terminate prematurely with a status value indicating that the call
was interrupted.

Returns
This function does not have a return code.

resetInterrupt

void clemrtl_resetInterrupt();

Description
Resets the interrupt signal.

Returns
This function does not have a return code.

About SPSS
SPSS helps people solve business problems using statistics and data mining. This predictive
technology enables our customers in the commercial, higher education and public sectors to
make better decisions and improve results. SPSS software and services are used successfully
in a wide range of applications, including customer attraction and retention, cross-selling,
survey research, fraud detection, enrollment management, Web site performance, forecasting
and scientific research. SPSS' market-leading products and product lines include SPSS,®
Clementine,® AnswerTree,® DecisionTime,® SigmaPlot® and LexiQuest.™ For more information,
visit our Web site at www.spss.com.

SPSS is a registered trademark and the other SPSS products named are trademarks of SPSS Inc.
All other names are trademarks of their respective owners. © Copyright 2002 SPSS Inc. All rights reserved.

CLMP7WP-0802 Technical report 13

You might also like