You are on page 1of 10

WebAnalyst Server™ - universal platform for intelligent e-business

What is WebAnalyst
Objectives
Main Concepts of WebAnalyst
Tasks carried out by WebAnalyst
Server architecture
Sample configurations
WebAnalyst Workplace
Expandability and open interfaces
Conclusion
By Yuri Slynko, Megaputer Intelligence Inc.
and
Sergei Ananyan, Megaputer Intelligence Inc.

Abstract
Corporate website represents an immediate, bi-directional, automated, and global channel for customer-
vendor interaction. Empowering the website with analytical capabilities and mass customization properties can
trigger yet another revolution in e-business, increasing the value of each customer and interaction for a vendor,
and simultaneously improving customer satisfaction with the services obtained through online channels.
Implementing these new capabilities becomes possible as a result of a synergetic interaction of the latest
achievements in the fields of online database and data warehouse architecture, machine learning and data
mining, automated content analysis, and content generation and delivery.
This paper discusses the functionality and architecture of WebAnalyst Server™, a universal open-architecture
platform for intelligent e-business developed by Megaputer Intelligence Inc. The system helps automatically
collect data about the website visitors in a unified database, transform data in a format suitable for analysis,
and learn visitors' preferences and value through utilizing a combination of techniques for collaborative filtering
and automated text analysis. WebAnalyst allows analysts to optimize the performance of the site and turn the
website in an immediate response personalized CRM channel. WebAnalyst Server is powered by exploration
engines of PolyAnalyst™ for data mining and TextAnalyst™ for content analysis, as well as by new analytical
technologies from Megaputer. The functionality of WebAnalyst can be easily expanded with additional plug-in
procedures.

What is WebAnalyst
WA is a corporate level analytical server providing an integrated platform for data warehousing and data
mining, with a special emphasis on e-business and web data mining applications. WA is a scalable application
server with open architecture, which automates tasks of e-business data collection, transformation, and
analysis, and personalization of interactions with customers. In addition, WA client provides flexible visual
programming environment for an analyst.
In more detail, WebAnalyst (WA) is an application server that can:
Process data from different sources, such as data transmission channels (HTTP), external databases,
and web server access log-files
Store all WA-related data in a unified WA database
Offer a suite of powerful built-in analytical and data processing tools
Provide the user with a visual programming environment for generating reusable analytical procedures
Figure 1. WebAnalyst functionality.

Objectives
WebAnalyst Server helps
Record all customer interactions through the website in the most efficient manner
Transform and store this data in a format suitable for further analysis
Use this data for learning customer interests, preferences, and possible course of action
Analyze efficiency of the website resources and architecture
Generate reports for executive managers
Recognize repetitive customers and access there profiles
Utilize all harvested knowledge to personalize communications with each customer
A typical organization using WebAnalyst has the following resources:
Web Server
Database of content
Database of products and transactions
Stored access-log files for initial system training
Only web server is a required prerequisite for running WebAnalyst and building a database of customer
profiles, but having all these components in operation at the time of deploying WebAnalyst increases the value
of utilizing the system.
There are four distinct groups of users of WebAnalyst inside an organization:
Webmasters and database administrators
Data analysts
Executive managers and marketing analysts
Website visitors
Let us consider how representatives of each of these groups perceive WebAnalyst, and what are the most
important tasks for which they need WebAnalyst.
Webmasters and database administrators
Automated and customizable recording of vital data about customer interactions in a database. This
eliminates the headache of continuous export of access log files in the database and delivers complete data
that analysts indeed can utilize in their analysis.
Simple tools for rapid creation and reuse of procedures for transforming operational data in the format
suitable for further analysis
Ease of merging the website generated data with data from other sources available in the organization
in a single data warehouse format
Convenient tools for immediate implementation of the result of data analysis to content generation and
customer interaction mechanisms.
Data analysts
Means for standardized access to disparate data obtained through different interaction channels.
Comprehensive arsenal of powerful data analysis tools
Ability to visually build reusable data analysis procedures and exchange these procedures for
collaborative projects
Avoid complexities of analyzing data and creating reports through standard means
Ability to quickly build customizable reports for top executives
Executive managers and marketing analysts
Optimal performance of the developed content
Increased value of every visitor and communication
Increased customer loyalty achieved through improved service satisfaction
Increased attractiveness of the channel for advertisers through personally targeted ad delivery
Easy-to-understand visual reports on the website performance as a customer interaction channel
Introducing a system in the work of analysts and providing simple control mechanisms
Website visitors
Better architecture website
Relevant content and ads
Personalized interactions through the channel

Main Concepts of WebAnalyst


WA is based on several underlying concepts that determine broad functionality and flexibility of the system.
The following basic concepts of WA are going to be discussed below in more detail:
Client-server architecture
Unified target database
Independent channel processors
Built-in virtual machine
Scalability
Access rights and security
Data warehousing concepts
Expandability and open architecture
Visual object development
1. Client-server architecture
The core component of WA is WebAnalyst Server, which assumes all the responsibility for carrying out tasks
of system control, client request processing, data collection, transformation and analysis, and content
modification. WA Server can have several clients of different nature: web browsers, WebAnalyst Workplace
application, as well as other WA Servers. WA Server is a passive component - it only processes client
requests, builds and returns results on requests, and performs scheduled tasks. The modular architecture of
WA Server and flexible built-in security mechanism facilitates licensing the server in any available configuration
and simple upgrade to more advanced modifications of the server.
2. Unified target database
WA stores all entities related to its operation in a single database called Unified Target Database (UDB),
capable of storing both relational data and unstructured binary data. Currently WA operates with three types of
entities:
Subjects (profile data for any person or virtual entity that sends a request to WA Server and expects to
receive a response). Example: website visitors
Data objects (data requested by subjects and processed (or profiled) by a channel processor.)
Example website resources
Executable objects (programs, analytical and configuration scripts, and acquired or generated results).
Example: compiled WA scripts
Additionally, UDB stores information about users of the system, meta-data and other data prepared by the user
for further analytical work (data marts). Overall, UDB contains data of the following types:
User profiles
Resource profiles
Sessions
Compiled scripts
Data mining models (including PolyAnalyst projects)
Configuration data (user rights, WA workplace tree parameters, etc.)
Additional relational data from external data sources
3. Independent channel processors
WA Server is a modular application where individual modules, called channel processors, are responsible for
processing independent data exchange channel. Each channel has associated with it plug-in procedures
stored in separate dynamic libraries. A channel processor performs the following primary tasks:
Monitor the corresponding data exchange channel (for example, HTTP port)
Parse all data traffic in the channel (requests and responses)
Profile the data (extract all valuable fragments of information)
Enrich collected information with external data
Store developed profiles in UDB
(Optionally) modify the channel data (personalize responses and modify requests)
The current version of WA contains only one channel processor - for processing data in HTTP channel, but
other channels, including SMTP and NNTP, are under development.
4. Built-in virtual machine
WA Virtual Machine (VM) facilitates the execution of objects created with WebAnalyst Script Language
(WASL), a simple but powerful object-based language designed to simplify creating scripts in a visual
programming environment provided by WA. WASL objects
Encapsulate both code and data
May contain other WASL objects
Are created from dynamic link libraries, text files, or binary data from UDB
Are stored in the UDB along with their compiled code and data
Are executed by channel processors (on-line objects), by the WebAnalyst Workplace, or by the
scheduler (off-line objects)
Are reusable: developed WASL objects can be called from other objects
Each object can be called from any component of WA Server
The library of WASL objects addressing different application tasks is being constantly expanded by developers
of WA. These WASL objects are written in C++ and correspondingly, are very efficient in their execution. Users
of WA can easily create their own WASL objects by building and recording intuitive workflow diagrams in the
visual programming environment offered by WA. Alternatively, users can develop their own COM modules and
connect them to WA through a provided interface when they need faster execution.
5. Scalability
WA was designed from ground up as an enterprise universal platform for e-business (initially) and other data-
intensive applications. Correspondingly, enterprise-wide scalability is one of the most important principles
incorporated in the architecture of WA. For large enterprise systems, one builds a collection of individual WA
servers, each carrying out one or more tasks: data collection, data transformation and analysis, and content
generation. The system scalability is ensured by several concepts implemented in WA, of which the most
important is the principle of equality of individual WA servers. This property allows the user to increase the
system throughput by simply adding more WA servers to the system.
The principle of equality is true for all WA servers except for a special WA Transaction Manager (TM) server
whose role is to control the state of UDB and manage all requests from all other WA servers to UDB. When
more than one WA server is working with a single UDB, only one dedicated server, Transaction Manager, has
a direct connection to UDB - other servers send their requests to UDB through this server. This architecture
facilitates the synchronization of all WA servers working with UDB.
In addition, one WA server can execute objects on other WA servers. All these features ensure that WA
servers can work in a multi-server distributed environment. Each server can have its own configuration and
can be programmed to carry out its individual tasks. Adding extra WA servers is simple because it requires no
changes to be made inside the original configuration.
6. Access rights and security
WA is a complex multi-user system, where different groups of users have different objectives and requirements
to the system, and should have different access rights to system resources. WA server is constantly accessed
by visitors through the Web, and its objects can be executed by entering a single URL in the Web browser. WA
stores all registered user logins and passwords in UDB along with the information about their access rights to
each WASL object. WA provides the same level of security that is offered by the standard HTTP protocol. WA
server requests from the user authentication data (login and password), which are passed in an HTTP request
header under a so-called "digest authentication scheme". "Thin" clients of WA server request this data through
a standard browser authentication dialog.
7. Data warehousing concepts
WA utilizes the following scheme in the web-mining process:
Problem identification
Web-data collecting
Data enhancing and transforming to the form, best for analysis
Data mining
Analyzing results and act on result
All data collecting tasks are separated from the data transformation and analytical tasks. Collecting tasks are
called on-line and are performed directly in the channel processors. Collected data is stored in the binary form
in UDB. Other data processing is performed off-line.
8. Expandability and open architecture
WA is based on open architecture. WA functionality can be expanded in four different ways:
Writing additional user WASL objects. Users can create and reuse their own objects for customized
tasks through a simple visual programming environment.
Attaching existing user COM objects. WA provides special objects that are gateways to calling COM
methods of other objects. This allows the user to connect new custom modules for data processing packaged
as COM modules.
Creating new channel processors. Among the most important new channels scheduled to be added to
WA are SMTP for e-mail data processing and NNTP for newsgroup data processing.
Creating new plug-in modules for the existing channel processors. Plug-in modules represent the most
robust means for processing content in a channel. Plug-in modules should be created for the most time-critical
applications.
9. Visual objects development
The user of WA is not required to be computer programmer. New WASL objects can be easily created with the
help of the WA Workplace visual programming environment utilizing three intuitive actions:
Drag-and-drop existing objects to the workplace
Connect objects with unidirectional arrows indicating the control flow
Assign parameters to objects and connections
Using WebAnalyst Workplace the analyst and the system administrator can create data processing chains,
content generators and configuration scripts.

Tasks carried out by WebAnalyst


WA can perform a large selection of tasks, which can be divided into seven different groups:
Collecting valuable information from different data transmission channels (HTTP, SMTP, NNTP, and
FTP).
Processing Web server access log-files.
Performing various data warehousing tasks: extracting information from different external databases,
cleansing, enhancing and storing into WA server data marts.
Carrying out analytical processing tasks with the help of PolyAnalyst exploration engines and custom
data mining modules.
Executing various WASL objects: both built-in in WA and created by WA users.
Returning (modified) requested information to the data transmission channel: generated content,
analytical reports and so on.
Providing visual development environment to the user.

Server architecture
WA server is a primary component of the WA system. WA server can be divided into three smaller
components: Transaction Manager, Channel Processors Manager, and Virtual Machine.
Figure 2. WebAnalyst architecture.
Transaction Manager and target database
When operating in a multi-threaded WA environment, WA Transaction Manager assumes the role of a
dedicated server that processes database requests from all other servers and forwards these requests to the
target DBMS. Transaction Manager is included in each WA server as a built-in module that coordinates
database requests from other components. The necessity of relying on Transaction Manager is determined by
the minimum referential integrity used in the target database of WA. Transaction Manager wraps its program
interfaces over system-level database interfaces. Currently Transaction Manager uses the OLE DB protocol to
connect to and work with the target database.
The WA target database can be implemented on almost any DBMS, the only one condition being the ability of
this DBMS to work with binary large columns (BLOBs). At present, WA is tuned to Microsoft SQL Server 7.0.
WA database contains the minimum DRI (declarative referential integrity) defined in the ANSI-92 standard.
The WA database only uses PRIMARY KEY and FOREIGN KEY constraints and does not contain any triggers
or stored procedures. The rest of the referential integrity requirements are supported by the TM component.
WA stores all its data in two forms: relational and binary. Relational data is used in the SQL SELECT
expressions when WA needs to establish quick, time critical access to stored data. Examples of WA relational
data are click-stream data (http-sessions), user rights, and objects hierarchies. Binary representation is used to
store non-relational unstructured data. Binary data is divided in two groups: data produced by a channel
processor (Profiles) and WASL objects. The Transaction Manager component introduces an expression of
special form, "Object Select", to access Profiles and their components.
Channel Processors Manager
After being started, Channel Processors Manager (CPM) checks the WA server configuration and initiates its
registered channel processors (CP). Currently WA contains only one type of CP - HTTP channel processor.
When started, the HTTP CP begins to monitor a port that is defined in the WA configuration (usually, a
standard HTTP port number 80). When the HTTP CP accepts a new request, it performs the following actions:
Identifying sender (visitor) by a cookie
Profiling the HTTP request (extracting all valuable information)
Obtaining the requested resource from the Web server
Profiling the resource
Applying the request and resource profiles to the corresponding visitor profile
Personalizing the resource and returning it to the visitor
Steps 2), 4), 5) and 6) are performed by the registered plug-in modules. Plug-in modules are dynamic link
libraries that contain fast optimized code for profiling requests and resources and personalizing (or simply
generating) content. WA server automatically determines which plug-in should be run to process an incoming
request/resource based on the type of request.
Virtual Machine
WA has its own built-in mechanism to execute user-created objects - Virtual Machine (VM). User objects, also
called WASL objects because these represent scripts based on WebAnalyst Script Language, can be created
in three different ways:
Visually by utilizing the WebAnalyst Workplace environment - and saved as a text file or in the target
database in binary form.
With the help of a text editor, using WASL syntax - and saved into a text file.
Written in Microsoft Visual C++ - and compiled into the DLL and saved in the target database in binary
form.
Every WASL object has its program interface through which it can be queried and executed. This interface
contains the following items:
Properties that can be set and queried. Properties can contain numbers, strings, date/time values and
some specialized information understood by the corresponding WASL objects. Properties are persistent
entities that can be saved and loaded during the execution of a WASL object.
Methods that can be executed with a set of arguments and return some values.
Meta-data that is utilized by the WebAnalyst Workplace.
User-created WASL object also contains a list of commands (body of the object) and aggregated WASL
objects, referred by these commands. Each command is a reference to another parameterized WASL method.
WA contains a special built-in WASL "System" object. The "System" object consists of a few methods that can
change the flow of control during the application execution. These methods implement the "IF", "GOTO",
"SPLIT" and "SYNC" keywords of WASL. The "System" object also implements the "SET", "SLEEP" and
"SCHEDULE" keywords.
C++ compiled WASL object can implement complex developer's logic and contain binary data. Writing new C+
+ compiled objects is the most advanced way to enhance WA capabilities with additional custom algorithms for
carrying out complex and time-critical tasks.
A WASL object, its code (or a reference to the corresponding DLL in some cases), and its data (properties and
binary data) can be stored in UDB.
WA includes a powerful toolkit of different instruments enclosed in WASL objects. WA contains objects for
connecting to a database, performing data operations, data mining, and log-file processing. In addition, it offers
a large selection of useful functional objects implementing mathematical, string, date/time, e-mail, and file
input/output operations.
Every object from this toolkit can be utilized by an analyst or developer to create new objects - exploration
chains, configuration scripts, and content generators.

Sample configurations
Currently the most traditional implementation of WA is using this server as a corporate analytical platform. WA
provides its users with the powerful toolbox of the data transformation and mining instruments, accessible
through a visual process development interface. With the help of WA Server and its client application, WA
Workplace, analysts can create elaborate exploration chains, stage data from external databases, perform
numerous data transformation tasks, and analyze the obtained results. The created exploration scripts and the
results of their operation are saved in the WA database, providing a convenient environment for individual and
collaborative projects where analysts can simply manage permissions on their objects.
Fig 3. WebAnalyst as an analyst's workplace.
The second method of implementing WA, which in fact gave birth to the name of the product, is to install WA
as the company's HTTP proxy-server. In this implementation, WA collects all valuable website-generated
information, visitor and resource profiles, and then transforms and stores this data in the WA database. All
analytical capabilities of WA server are in this case similar to those discussed in association with the previous
configuration. In addition, WA can periodically import web-data from the web server access log files if
necessary. This import operation might be necessary for an initial population of the WA database with data
about past visitors and transactions. The major difference between the two discussed implementations of WA
is that in the latter implementation WA utilizes its HTTP channel processor.

Fig 4. WebAnalyst as a web data mining server.


The third and the most advanced method for implementing WA, distributed server application, utilizes all the
capabilities of WA. In this implementation, several WA servers can form a powerful analytical cluster. One
server, Transactions Manager is dedicated to operating with the WA database, other servers may play different
roles: they can act as HTTP channels processors, data mining servers, or content generation servers.

Fig 5. WebAnalyst is a scalable distributed platform for e-Business.

WebAnalyst Workplace
WebAnalyst Workplace is an integrated visual development environment, developed for the automation of the
following tasks:
Data warehousing
Data mining
Creating user objects - content generators, control and configuration scripts, etc.
WA Workplace has a graphical user MDI interface, traditional for the well-known IDEs. It also inherits some
interface features of PolyAnalyst, a data mining tool from Megaputer Intelligence.
Fig 6. WebAnalyst main window.
WA Workplace features the following elements: user-objects tree panel, object inspector panel, log panel,
processes panel, and two toolbars. WA Workplace windows contain models and results. The user can drag
and drop objects and methods from the tree panel on the model window and connect these items by arrows.
Each item corresponds to a parameterized method call. The user can modify information and meta-data of any
object or item through a special inspector panel.

Expandability and open interfaces


Interace for connecting external COM objects created by developers. Please note that WA itself does not
provide COM interfaces because it is intended to become a multiplatform system. But working on Windows NT
and 2000 platforms, WA can call methods of external COM objects. Although these objects do not have
access to binary data in UDB, they can address numerous tasks of analytical data processing, content
generation, etc. Additional externally connected COM objrects can significantly expand the initial functionality
of WA. The COM objects developed by the user are represented in WA as executable objects of its VM.
Interface VM that allows users develop VM objects in C++. These objects are executed not by VM but
directly by the computer processor, and are the most quickly executed user objects. When executing these
objects, WA does not spend processor time on the transformation of arguments, which it has to do when
calling methods of external COM objects.
Interface for channel processor plug-in modules. One should keep in mind that processing data in a
channel is the most time-critical procedure. Such tasks as content and customer profiling, as well as complex
personalization of content, have to be optimized to the fullest extent. One should avoid utilizing VM objects for
processing data from a web server channel which has to serve more than ten visitors per second because in
this case WA spends too much time simply interpreting user objects. Processor plug-in modules should be
utilized instead.
Interface for the development of a channeo processor. Utilizing this most advanced interface,
developers can creata their own channel processor and connect it to the system.

Conclusion
Summarizing, the following important capabilities are gained by users of WA:
Collecting website-generated data, transforming this data and storing it in a unified database
Analyzing data with the help of integrated data mining modules
Writing their own data processing algorithms and connecting them to WA
Visusally creating reusable procedures for batch-style data analysis
Developing generators of personalized content
Creating their own modules for model visualizing and connecting them to WA
The small to medium business edition of WA is scheduled for release in June 2001. The highly scalable
version of WA serving as an enterprise-wide solution for large businesses will be released in October 2001.

You might also like