Professional Documents
Culture Documents
What is WebAnalyst
Objectives
Main Concepts of WebAnalyst
Tasks carried out by WebAnalyst
Server architecture
Sample configurations
WebAnalyst Workplace
Expandability and open interfaces
Conclusion
By Yuri Slynko, Megaputer Intelligence Inc.
and
Sergei Ananyan, Megaputer Intelligence Inc.
Abstract
Corporate website represents an immediate, bi-directional, automated, and global channel for customer-
vendor interaction. Empowering the website with analytical capabilities and mass customization properties can
trigger yet another revolution in e-business, increasing the value of each customer and interaction for a vendor,
and simultaneously improving customer satisfaction with the services obtained through online channels.
Implementing these new capabilities becomes possible as a result of a synergetic interaction of the latest
achievements in the fields of online database and data warehouse architecture, machine learning and data
mining, automated content analysis, and content generation and delivery.
This paper discusses the functionality and architecture of WebAnalyst Server™, a universal open-architecture
platform for intelligent e-business developed by Megaputer Intelligence Inc. The system helps automatically
collect data about the website visitors in a unified database, transform data in a format suitable for analysis,
and learn visitors' preferences and value through utilizing a combination of techniques for collaborative filtering
and automated text analysis. WebAnalyst allows analysts to optimize the performance of the site and turn the
website in an immediate response personalized CRM channel. WebAnalyst Server is powered by exploration
engines of PolyAnalyst™ for data mining and TextAnalyst™ for content analysis, as well as by new analytical
technologies from Megaputer. The functionality of WebAnalyst can be easily expanded with additional plug-in
procedures.
What is WebAnalyst
WA is a corporate level analytical server providing an integrated platform for data warehousing and data
mining, with a special emphasis on e-business and web data mining applications. WA is a scalable application
server with open architecture, which automates tasks of e-business data collection, transformation, and
analysis, and personalization of interactions with customers. In addition, WA client provides flexible visual
programming environment for an analyst.
In more detail, WebAnalyst (WA) is an application server that can:
Process data from different sources, such as data transmission channels (HTTP), external databases,
and web server access log-files
Store all WA-related data in a unified WA database
Offer a suite of powerful built-in analytical and data processing tools
Provide the user with a visual programming environment for generating reusable analytical procedures
Figure 1. WebAnalyst functionality.
Objectives
WebAnalyst Server helps
Record all customer interactions through the website in the most efficient manner
Transform and store this data in a format suitable for further analysis
Use this data for learning customer interests, preferences, and possible course of action
Analyze efficiency of the website resources and architecture
Generate reports for executive managers
Recognize repetitive customers and access there profiles
Utilize all harvested knowledge to personalize communications with each customer
A typical organization using WebAnalyst has the following resources:
Web Server
Database of content
Database of products and transactions
Stored access-log files for initial system training
Only web server is a required prerequisite for running WebAnalyst and building a database of customer
profiles, but having all these components in operation at the time of deploying WebAnalyst increases the value
of utilizing the system.
There are four distinct groups of users of WebAnalyst inside an organization:
Webmasters and database administrators
Data analysts
Executive managers and marketing analysts
Website visitors
Let us consider how representatives of each of these groups perceive WebAnalyst, and what are the most
important tasks for which they need WebAnalyst.
Webmasters and database administrators
Automated and customizable recording of vital data about customer interactions in a database. This
eliminates the headache of continuous export of access log files in the database and delivers complete data
that analysts indeed can utilize in their analysis.
Simple tools for rapid creation and reuse of procedures for transforming operational data in the format
suitable for further analysis
Ease of merging the website generated data with data from other sources available in the organization
in a single data warehouse format
Convenient tools for immediate implementation of the result of data analysis to content generation and
customer interaction mechanisms.
Data analysts
Means for standardized access to disparate data obtained through different interaction channels.
Comprehensive arsenal of powerful data analysis tools
Ability to visually build reusable data analysis procedures and exchange these procedures for
collaborative projects
Avoid complexities of analyzing data and creating reports through standard means
Ability to quickly build customizable reports for top executives
Executive managers and marketing analysts
Optimal performance of the developed content
Increased value of every visitor and communication
Increased customer loyalty achieved through improved service satisfaction
Increased attractiveness of the channel for advertisers through personally targeted ad delivery
Easy-to-understand visual reports on the website performance as a customer interaction channel
Introducing a system in the work of analysts and providing simple control mechanisms
Website visitors
Better architecture website
Relevant content and ads
Personalized interactions through the channel
Server architecture
WA server is a primary component of the WA system. WA server can be divided into three smaller
components: Transaction Manager, Channel Processors Manager, and Virtual Machine.
Figure 2. WebAnalyst architecture.
Transaction Manager and target database
When operating in a multi-threaded WA environment, WA Transaction Manager assumes the role of a
dedicated server that processes database requests from all other servers and forwards these requests to the
target DBMS. Transaction Manager is included in each WA server as a built-in module that coordinates
database requests from other components. The necessity of relying on Transaction Manager is determined by
the minimum referential integrity used in the target database of WA. Transaction Manager wraps its program
interfaces over system-level database interfaces. Currently Transaction Manager uses the OLE DB protocol to
connect to and work with the target database.
The WA target database can be implemented on almost any DBMS, the only one condition being the ability of
this DBMS to work with binary large columns (BLOBs). At present, WA is tuned to Microsoft SQL Server 7.0.
WA database contains the minimum DRI (declarative referential integrity) defined in the ANSI-92 standard.
The WA database only uses PRIMARY KEY and FOREIGN KEY constraints and does not contain any triggers
or stored procedures. The rest of the referential integrity requirements are supported by the TM component.
WA stores all its data in two forms: relational and binary. Relational data is used in the SQL SELECT
expressions when WA needs to establish quick, time critical access to stored data. Examples of WA relational
data are click-stream data (http-sessions), user rights, and objects hierarchies. Binary representation is used to
store non-relational unstructured data. Binary data is divided in two groups: data produced by a channel
processor (Profiles) and WASL objects. The Transaction Manager component introduces an expression of
special form, "Object Select", to access Profiles and their components.
Channel Processors Manager
After being started, Channel Processors Manager (CPM) checks the WA server configuration and initiates its
registered channel processors (CP). Currently WA contains only one type of CP - HTTP channel processor.
When started, the HTTP CP begins to monitor a port that is defined in the WA configuration (usually, a
standard HTTP port number 80). When the HTTP CP accepts a new request, it performs the following actions:
Identifying sender (visitor) by a cookie
Profiling the HTTP request (extracting all valuable information)
Obtaining the requested resource from the Web server
Profiling the resource
Applying the request and resource profiles to the corresponding visitor profile
Personalizing the resource and returning it to the visitor
Steps 2), 4), 5) and 6) are performed by the registered plug-in modules. Plug-in modules are dynamic link
libraries that contain fast optimized code for profiling requests and resources and personalizing (or simply
generating) content. WA server automatically determines which plug-in should be run to process an incoming
request/resource based on the type of request.
Virtual Machine
WA has its own built-in mechanism to execute user-created objects - Virtual Machine (VM). User objects, also
called WASL objects because these represent scripts based on WebAnalyst Script Language, can be created
in three different ways:
Visually by utilizing the WebAnalyst Workplace environment - and saved as a text file or in the target
database in binary form.
With the help of a text editor, using WASL syntax - and saved into a text file.
Written in Microsoft Visual C++ - and compiled into the DLL and saved in the target database in binary
form.
Every WASL object has its program interface through which it can be queried and executed. This interface
contains the following items:
Properties that can be set and queried. Properties can contain numbers, strings, date/time values and
some specialized information understood by the corresponding WASL objects. Properties are persistent
entities that can be saved and loaded during the execution of a WASL object.
Methods that can be executed with a set of arguments and return some values.
Meta-data that is utilized by the WebAnalyst Workplace.
User-created WASL object also contains a list of commands (body of the object) and aggregated WASL
objects, referred by these commands. Each command is a reference to another parameterized WASL method.
WA contains a special built-in WASL "System" object. The "System" object consists of a few methods that can
change the flow of control during the application execution. These methods implement the "IF", "GOTO",
"SPLIT" and "SYNC" keywords of WASL. The "System" object also implements the "SET", "SLEEP" and
"SCHEDULE" keywords.
C++ compiled WASL object can implement complex developer's logic and contain binary data. Writing new C+
+ compiled objects is the most advanced way to enhance WA capabilities with additional custom algorithms for
carrying out complex and time-critical tasks.
A WASL object, its code (or a reference to the corresponding DLL in some cases), and its data (properties and
binary data) can be stored in UDB.
WA includes a powerful toolkit of different instruments enclosed in WASL objects. WA contains objects for
connecting to a database, performing data operations, data mining, and log-file processing. In addition, it offers
a large selection of useful functional objects implementing mathematical, string, date/time, e-mail, and file
input/output operations.
Every object from this toolkit can be utilized by an analyst or developer to create new objects - exploration
chains, configuration scripts, and content generators.
Sample configurations
Currently the most traditional implementation of WA is using this server as a corporate analytical platform. WA
provides its users with the powerful toolbox of the data transformation and mining instruments, accessible
through a visual process development interface. With the help of WA Server and its client application, WA
Workplace, analysts can create elaborate exploration chains, stage data from external databases, perform
numerous data transformation tasks, and analyze the obtained results. The created exploration scripts and the
results of their operation are saved in the WA database, providing a convenient environment for individual and
collaborative projects where analysts can simply manage permissions on their objects.
Fig 3. WebAnalyst as an analyst's workplace.
The second method of implementing WA, which in fact gave birth to the name of the product, is to install WA
as the company's HTTP proxy-server. In this implementation, WA collects all valuable website-generated
information, visitor and resource profiles, and then transforms and stores this data in the WA database. All
analytical capabilities of WA server are in this case similar to those discussed in association with the previous
configuration. In addition, WA can periodically import web-data from the web server access log files if
necessary. This import operation might be necessary for an initial population of the WA database with data
about past visitors and transactions. The major difference between the two discussed implementations of WA
is that in the latter implementation WA utilizes its HTTP channel processor.
WebAnalyst Workplace
WebAnalyst Workplace is an integrated visual development environment, developed for the automation of the
following tasks:
Data warehousing
Data mining
Creating user objects - content generators, control and configuration scripts, etc.
WA Workplace has a graphical user MDI interface, traditional for the well-known IDEs. It also inherits some
interface features of PolyAnalyst, a data mining tool from Megaputer Intelligence.
Fig 6. WebAnalyst main window.
WA Workplace features the following elements: user-objects tree panel, object inspector panel, log panel,
processes panel, and two toolbars. WA Workplace windows contain models and results. The user can drag
and drop objects and methods from the tree panel on the model window and connect these items by arrows.
Each item corresponds to a parameterized method call. The user can modify information and meta-data of any
object or item through a special inspector panel.
Conclusion
Summarizing, the following important capabilities are gained by users of WA:
Collecting website-generated data, transforming this data and storing it in a unified database
Analyzing data with the help of integrated data mining modules
Writing their own data processing algorithms and connecting them to WA
Visusally creating reusable procedures for batch-style data analysis
Developing generators of personalized content
Creating their own modules for model visualizing and connecting them to WA
The small to medium business edition of WA is scheduled for release in June 2001. The highly scalable
version of WA serving as an enterprise-wide solution for large businesses will be released in October 2001.