You are on page 1of 40




In this state of affairs achieving client satisfaction is not any longer happy with an easy
listing of marketing contacts, however needs elaborate info concerning Customers. past
purchase still as prediction of future purchases. Easy information software system
supported SQL doesn't support these increased demands for info. data processing is
commonly process as finding hidden info in the information. or else it's been referred to
as as knowledge analysis, data discovery and deductive learning. data processing
technologies and techniques for recognizing and trailing pattern with in data helps
business sift through layers of on the face of it unrelated knowledge for meaty
relationship, where they will anticipate instead of merely scan to client wants during this
paper we have a tendency to discuss a business and technological summary of knowledge
mining and description however will we have a tendency to optimize client profitability
through data processing application, together with sound business processes and
complement technologies, data processing will reinforce and redefine client relationship.
The aim of this project to seek out out the role of knowledge mining in client focus
business strategy because -With speedy globalisation of business and merchandise
differentiation turning into less relevant
and competitive, client relationship has become an element of competitive advantage.
Today Customers are answerable. it's easier than ever for purchasers to comparison
search and with a click of the mouse, to modify firms. As a result client relationship
becomes an organization most valuable plus each company’s strategy ought to address
the way to realize and retain the fore most profitable Customers.

1. Background

During the last few years, data or text mining has received more and more attention from
different fields, especially from the business community. This commercial interest has
grown mainly because of the potential in discovering knowledge from the vast amount of
data collected from customers for improving business competitiveness.

Knowledge mining can be defined as a process of extracting systematic patterns or

relationships and other information from databases for improving people’s decision
ability. It has a wide range of applications including business intelligence gathering, drug
discovery, product design, intelligent manufacturing, supply-chain management, logistics
and even research profiling. The following briefly provide a few examples.

In business:
 Targeting specific products and services that the customers are more likely to buy;
 Determining buying patterns of credit card customers to predict their future
purchases. Such information can also be used, for instance, for identifying stolen
credit cards.

In product design and development:

 Learning the relationships between customer needs and design specifications;
 Based on past projects the factors affecting a project’s success/failure can be
identified systematically.

In manufacturing:
 Fault diagnosis and prediction of the amount of product defects in manufacturing
 Operational manufacturing control such as intelligent scheduling systems, which
learns dynamic behavior of process outcomes, and generates control policies.
Knowledge mining software like Enterprise Miner and Intelligent Miner released by
SAS and IBM, respectively, are very popular in many applications. Companies utilized
knowledge mining tools successfully in their operations include: Fleet financial group
(customer characteristics analysis), Ford (harshness, noise and vibration analysis), Boeing
(post-flight diagnostics), Kodak (data visualization), Texas Instruments (fault diagnosis)
and Motorola (customer data management and analysis).

Knowledge mining process is iterative and consists of the following main stages:
understanding problem goals; data selection; data cleaning and preprocessing;
discovering patterns; analysis and interpretation; reporting and using discovered
knowledge. In particular, pattern discovery is a crucial step. There are several
approaches to discover patterns including classification, association, clustering,
regression, sequence analysis and visualization. Each of these approaches can be
implemented via one of the following competing and yet complementary techniques such
as statistical data analysis, artificial neural networks, machine learning, and pattern
recognition. We will not go deeply into the core of the above methods. Instead, let us
present a real application of pattern discovery using a machine learning technique called
CART (Classification and Regression Trees) in financial analysis.

There are three main reasons for the popularity of tree-based methods used in CART.
First, they decompose a complex problem into a series of simpler problems (e.g., binary
decisions). Second, the tree structure resulted from successive decompositions of the
data often provides a great understanding of the complex problem. Third, the methods
generally require a minimal set of assumptions for solving the problem.

An important outcome of preliminary investigation is the determination that the

system request is feasible. This is possible only if it is feasible within limited resource
and time. The different feasibilities that have to be analyzed are

 Operational Feasibility
 Economic Feasibility
 Technical Feasibility

Operational Feasibility
Operational Feasibility deals with the study of prospects of the system to be
developed. This system operationally eliminates all the tensions of the Admin and helps
him in effectively tracking the project progress. This kind of automation will surely
reduce the time and energy, which previously consumed in manual work. Based on the
study, the system is proved to be operationally feasible.

Economic Feasibility
Economic Feasibility or Cost-benefit is an assessment of the economic
justification for a computer based project. As hardware was installed from the beginning
& for lots of purposes thus the cost on project of hardware is low. Since the system is a
network based, any number of employees connected to the LAN within that organization
can use this tool from at anytime. The Virtual Private Network is to be developed using
the existing resources of the organization. So the project is economically feasible.

Technical Feasibility
According to Roger S. Pressman, Technical Feasibility is the assessment of the
technical resources of the organization. The organization needs IBM compatible
machines with a graphical web browser connected to the Internet and Intranet. The
system is developed for platform Independent environment. Java Server Pages,
JavaScript, HTML, SQL server and WebLogic Server are used to develop the system. The
technical feasibility has been carried out. The system is technically feasible for
development and can be developed with the existing facility.


Not all request projects are desirable or feasible. Some organization receives so
many project requests from client users that only few of them are pursued. However,
those projects that are both feasible and desirable should be put into schedule. After a
project request is approved, it cost, priority, completion time and personnel requirement
is estimated and used to determine where to add it to any project list. Truly speaking, the
approval of those above factors, development works can be launched.


Input Design plays a vital role in the life cycle of software development, it
requires very careful attention of developers. The input design is to feed data to the
application as accurate as possible. So inputs are supposed to be designed effectively so
that the errors occurring while feeding are minimized. According to Software
Engineering Concepts, the input forms or screens are designed to provide to have a
validation control over the input limit, range and other related validations.

This system has input screens in almost all the modules. Error messages are
developed to alert the user whenever he commits some mistakes and guides him in the
right way so that invalid entries are not made. Let us see deeply about this under module

Input design is the process of converting the user created input into a computer-
based format. The goal of the input design is to make the data entry logical and free from
errors. The error is in the input are controlled by the input design. The application has
been developed in user-friendly manner. The forms have been designed in such a way
during the processing the cursor is placed in the position where must be entered. The user
is also provided with in an option to select an appropriate input from various alternatives
related to the field in certain cases.
Validations are required for each data entered. Whenever a user enters an
erroneous data, error message is displayed and the user can move on to the subsequent
pages after completing all the entries in the current page.


The Output from the computer is required to mainly create an efficient method of
communication within the company primarily among the project leader and his team
members, in other words, the administrator and the clients. The output of VPN is the
system which allows the project leader to manage his clients in terms of creating new
clients and assigning new projects to them, maintaining a record of the project validity
and providing folder level access to each client on the user side depending on the projects
allotted to him. After completion of a project, a new project may be assigned to the client.
User authentication procedures are maintained at the initial stages itself. A new user may
be created by the administrator himself or a user can himself register as a new user but
the task of assigning projects and validating a new user rests with the administrator only.

The application starts running when it is executed for the first time. The server has to be
started and then the internet explorer in used as the browser. The project will run on the
local area network so the server machine will serve as the administrator while the other
connected systems can act as the clients. The developed system is highly user friendly
and can be easily understood by anyone using it even for the first time.

 Data provider
In this module, the data provider uploads their data in the Data server. For the
security purpose the data owner encrypts the data file and then store in the server.
The Data owner can have capable of manipulating the encrypted data file.

 Data Server
The Data server manages which is to provide data storage service for the Data
Owners. Data owners encrypt their data files and store them in the Server for
sharing with data consumers. To access the shared data files, data consumers
download encrypted data files of their interest from the Server and then Server
will decrypt them. The server will generate the aggregate key if the end user
requests multiple files at the same time to access.

 END User
In this module, the user can only access the data file with the encrypted key word.
The user can search the file for both the methods such as SSED and K nearest
neighbor search. The user has to register and then login from the Data server.

Hardware specification:
Processor - Pentium –IV
Speed - 1.1 Ghz
RAM - 256 MB(min)
Hard Disk - 20 GB

Floppy Drive - 1.44 MB

Key Board - Standard Windows Keyboard
Mouse - Two or Three Button Mouse
Monitor - SVGA
Software specification:
Operating System : Windows95/98/2000/XP
Application Server : Tomcat5.0/6.X
Front End : HTML, Java, Jsp
Scripts : JavaScript.
Server side Script : Java Server Pages.
Database : Mysql 5.0
Database Connectivity : JDBC.


Java is a high-level language that can be characterized by all of the following
 Simple
 Object Oriented
 Distributed
 Multithreaded
 Dynamic
 Architecture Neutral
 Portable
 High performance
 Robust
 Secure

In the Java programming language, all the source code is first written in plain text
files ending with the .java extension. Those source files are then compiled into .class files
by the Java compiler (javac). A class file does not contain code that is native to your
processor; it instead contains byte codes - the machine language of the Java Virtual
Machine. The Java launcher tool (java) then runs your application with an instance of the
Java Virtual Machine.

A platform is the hardware or software environment in which a program runs. The
most popular platforms are Microsoft Windows, Linux, Solaris OS and MacOS. Most
platforms can be described as a combination of the operating system and underlying
hardware. The java platform differs from most other platforms in that it’s a software-only
platform that runs on the top of other hardware-based platforms.
The java platform has two components:
 The Java Virtual Machine.
 The Java Application Programming Interface(API)

Java Virtual Machine is the base for the java platform and is pored onto various
hardware-based platforms.
The API is a large collection of ready-made software components that provide
many useful capabilities, such as graphical user interface (GUI) widgets. It is grouped
into libraries of related classes and interfaces, these libraries are known as packages.
As a platform-independent environment, the Java platform can be a bit slower
than native code. However, advances in compiler and virtual machine technologies are
bringing performance close to that of native code without threatening portability.
Development Tools:
The development tools provide everything you’ll need for compiling, running,
monitoring, debugging, and documenting your applications. As a new developer, the
main tools you’ll be using are the Java compiler (javac), the Java launcher (java), and the
Java documentation (javadoc).
Application programming Interface (API):
The API provides the core functionality of the Java programming language. It
offers a wide array of useful classes ready for use in your own applications. It spans
everything from basic objects, to networking and security.

Deployment Technologies:
The JDK provides standard mechanisms such as Java Web Start and Java Plug-In,
for deploying your applications to end users.

User Interface Toolkits:

The Swing and Java 2D toolkits make it possible to create sophisticated Graphical
User Interfaces (GUIs).
Drag-and-drop support:
Drag-and-drop is one of the seemingly most difficult features to implement in
user interface development. It provides a high level of usability and intuitiveness.
Drag-and-drop is, as its name implies, a two step operation. Code must to facilitate
dragging and code to facilitate dropping. Sun provides two classes to help with this
namely DragSource and DropTarget
Look and Feel Support:
Swing defines an abstract Look and Feel class that represents all the information
central to a look-and-feel implementation, such as its name, its description, whether it’s a
native look-and-feel- and in particular, a hash table (known as the “Defaults Table”) for
storing default values for various look-and-feel attributes, such as colors and fonts.
Each look-and-feel implementation defines a subclass of Look And Feel (for
example, swing .plaf.motif.MotifLookAndFeel) to provide Swing with the necessary
information to manage the look-and-feel.
The UIManager is the API through which components and programs access look-
and-feel information (They should rarely, if ever, talk directly to a
LookAndFeelinstance). UIManager is responsible for keeping track of which
LookAndFeel classes are available, which are installed, and which is currently the
default. The UIManager also manages access to the Defaults Table for the current look-
Dynamically Changing the Default Look-and-Feel:
When a Swing application programmatically sets the look-and-feel, the ideal
place to do so is before any Swing components are instantiated. This is because the
UIManager.setLookAndFeel() method makes a particular Look And Feel the current
default by loading and initializing that LookAndFeel instance, but it does not
automatically cause any existing components to change their look-and-feel.
Remember that components initialize their UI delegate at construct time,
therefore, if the current default changes after they are constructed, they will not
automatically update their UIs accordingly. It is up to the program to implement this
dynamic switching by traversing the containment hierarchy and updating the components

Integrated Development Environment (IDE)

IDE Introduction
An Integrated Development Environment (IDE) or interactive development
environment is a software application that provides comprehensive facilities to
computer programmers for software development. An IDE normally consists of a
source code editor, build automation tools and a debugger. Most modern IDEs have
intelligent code completion. Some IDEs contain a compiler, interpreter, or both, such
as Net Beans and Eclipse. Many modern IDEs also have a class browser, an object
browser, and a class hierarchy diagram, for use in object-oriented software
development. The IDE is designed to limit coding errors and facilitate error correction
with tools such as the “NetBeans” Find Bugs to locate and fix common Java coding
problems and Debugger to manage complex code with field watches, breakpoints and
execution monitoring.

An Integrated Development Environment (IDE) is an application that facilitates

application development. In general, an IDE is a graphical user interface (GUI)-based
workbench designed to aid a developer in building software applications with an
integrated environment combined with all the required tools at hand. Most common
features, such as debugging, version control and data structure browsing, help a
developer quickly execute actions without switching to other applications. Thus, it
helps maximize productivity by providing similar user interfaces (UI) for related
components and reduces the time taken to learn the language. An IDE supports single
or multiple languages.

One aim of the IDE is to reduce the configuration necessary to piece together
multiple development utilities, instead providing the same set of capabilities as a
cohesive unit. Reducing that setup time can increase developer productivity, in cases
where learning to use the IDE is faster than manually integrating all of the individual
tools. Tighter integration of all development tasks has the potential to improve overall
productivity beyond just helping with setup tasks.
IDE Supporting Languages

Some IDEs support multiple languages, such as Eclipse, ActiveState Komodo,

IntelliJ IDEA, MyEclipse, Oracle JDeveloper, NetBeans, Codenvy and Microsoft
Visual studio GNU Emacs based on C and Emacs Lisp, and IntelliJ IDEA, Eclipse,
MyEclipse or NetBeans, all based on Java, or MonoDevelop, based on C#. Eclipse
and Netbeans have plugins for C/C++, Ada, GNAT (for example AdaGIDE), Perl,
Python, Ruby, and PHP.

IDE Tools

There are many IDE tools available for source code editor, built automation tools
and debugger.

NetBeans IDE 8.0 and new features for Java 8

NetBeans IDE 8.0 is released, also providing new features for Java 8 technologies. It has
code analyzers and editors for working with Java SE 8, Java SE Embedded 8, and Java
ME Embedded 8. The IDE also has new enhancements that further improve its support
for Maven and Java EE with PrimeFaces.

Most important highlights are:

The top 5 features of NetBeans IDE 8 are as follows:

1. Tools for Java 8 Technologies. Anyone interested in getting started with lambdas,
method references, streams, and profiles in Java 8 can do so immediately by downloading
NetBeans IDE 8. Java hints and code analyzers help you upgrade anonymous inner
classes to lambdas, right across all your code bases, all in one go. Java hints in the Java
editor let you quickly and intuitively switch from lambdas to method references, and back

Moreover, Java SE Embedded support entails that you’re able to deploy, run, debug or
profile Java SE applications on an embedded device, such as Raspberry PI, directly from
NetBeans IDE. No new project type is needed for this, you can simply use the standard
Java SE project type for this purpose.
1. Tools for Java EE Developers. The code generators for which NetBeans IDE is
well known have been beefed up significantly. Where before you could create bits
and pieces of code for various popular Java EE component libraries, you can now
generate complete PrimeFaces applications, from scratch, including CRUD
functionality and database connections.

Additionally, the key specifications of the Java EE 7 Platform now have new and
enhanced tools, such as for working with JPA and CDI, as well as Facelets.

Let’s not forget to mention in this regard that Tomcat 8.0 and TomEE are now supported,
too, with a new plugin for WildFly in the NetBeans Plugin Manager.

3. Tools for Maven. A key strength of NetBeans IDE, and a reason why many developers
have started using it over the past years, is its out of the box support for Maven. No need
to install a Maven plugin, since it’s a standard part of the IDE. No need to deal with IDE-
specific files, since the POM provides the project structure. And now, in NetBeans IDE
8.0, there are enhancements to the graph layouting, enabling you to visualize your POM
in various ways, while also being able to graphically exclude dependencies from the
POM file, without touching the XML.
4. Tools for JavaScript. Thanks to powerful new JavaScript libraries and frameworks
over the years, JavaScript as a whole has become a lot more attractive for many
developers. For some releases already, NetBeans IDE has been available as a pure
frontend environment, that is, minus all the Java tools for which it is best known. This
lightweight IDE, including Git versioning tools, provides a great environment for
frontend devs. In particular, for users of AngularJS, Knockout, and Backbone, the IDE
comes with deep editor tools, such as code completion and cross-artifact navigation. In
NetBeans IDE 8.0, there’s a very specific focus on AngularJS, since this is such a
dominant JavaScript solution at the moment. From these controllers, you can navigate,
via hyperlinks embedded in the JavaScript editor, to the related HTML views. And, as
shown in this screenshot, you can use code completion inside the HTML editor to access
controllers, and even the properties within the controllers, to help you accurately code the
related artifacts in your AngularJS applications.

Also, remember that there’s no need to download the AngularJS Seed template, since it’s
built into the NetBeans New Project wizard.

5. Tools for HTML5. JavaScript is a central component of the HTML5 Platform, a

collective term for a range of tools and technologies used in frontend development.
Popular supporting technologies are Grunt, a build tool, and Karma, a test runner
framework. Both of these are now supported out of the box in NetBeans IDE 8.0


In an effort to set an independent database standard API for Java, Sun
Microsystems developed Java Database Connectivity, or JDBC. JDBC offers a generic
SQL database access mechanism that provides a consistent interface to a variety of
RDBMSs. This consistent interface is achieved through the use of “plug-in” database
connectivity modules, or drivers. If a database vendor wishes to have JDBC support, he
or she must provide the driver for each platform that the database and Java run on.
To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC.
As you discovered earlier in this chapter, ODBC has widespread support on a variety of
platforms. Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market
much faster than developing a completely new connectivity solution.
JDBC was announced in March of 1996. It was released for a 90 day public
review that ended June 8, 1996. Because of user input, the final JDBC v1.0 specification
was released soon after.
The remainder of this section will cover enough information about JDBC for you
to know what it is about and how to use it effectively. This is by no means a complete
overview of JDBC. That would fill an entire book.

JDBC Goals
Few software packages are designed without goals in mind. JDBC is one that,
because of its many goals, drove the development of the API. These goals, in conjunction
with early reviewer feedback, have finalized the JDBC class library into a solid
framework for building database applications in Java.
The goals that were set for JDBC are important. They will give you some insight as to
why certain classes and functionalities behave the way they do. The eight design goals for
JDBC are as follows:
1. SQL Level API:
The designers felt that their main goal was to define a SQL interface for Java.
Although not the lowest database interface level possible, it is at a low enough level for
higher-level tools and APIs to be created. Conversely, it is at a high enough level for
application programmers to use it confidently. Attaining this goal allows for future tool
vendors to “generate” JDBC code and to hide many of JDBC’s complexities from the end
2. SQL Conformance:
SQL syntax varies as you move from database vendor to database vendor. In an
effort to support a wide variety of vendors, JDBC will allow any query statement to be
passed through it to the underlying database driver. This allows the connectivity module
to handle non-standard functionality in a manner that is suitable for its users.
3. JDBC must be implemental on top of common database interfaces
The JDBC SQL API must “sit” on top of other common SQL level APIs. This
goal allows JDBC to use existing ODBC level drivers by the use of a software interface.
This interface would translate JDBC calls to ODBC and vice versa.

4. Provide a Java interface that is consistent with the rest of the Java system
Because of Java’s acceptance in the user community thus far, the designers feel
that they should not stray from the current design of the core Java system.

SQL Server 2008

Microsoft SQL Server is a relational database management system developed by
Microsoft. As a database server, it is a software product with the primary function of
storing and retrieving data as requested by other software applications-which may run
either on the same computer or on another computer across a network (including the
SQL is Structured Query Language, which is a computer language for storing,
manipulating and retrieving data stored in relational database. SQL is the standard
language for Relation Database System. All relational database management systems like
MySQL, MS Access, Oracle, Sybase, Informix, postgres and SQL Server use SQL as
standard database language. Also, they are using different dialects, such as:
 MS SQL Server using T-SQL,
 Oracle using PL/SQL,
 MS Access version of SQL is called JET SQL (native format) etc.

The history of Microsoft SQL Server begins with the first Microsoft SQL Server
product - SQL Server 1.0, a 16-bit server for the OS/2 operating system in 1989 - and
extends to the current day. As of December 2016 the following versions are supported by
 SQL Server 2008
 SQL Server 2008 R2
 SQL Server 2012
 SQL Server 2014
 SQL Server 2016

The current version is Microsoft SQL Server 2016, released June 1, 2016. The
RTM version is 13.0.1601.5. SQL Server 2016 is supported on x64 processors only.
SQL Process
When you are executing an SQL command for any RDBMS, the system
determines the best way to carry out your request and SQL engine figures out how to
interpret the task. There are various components included in the process. These
components are Query Dispatcher, Optimization Engines, Classic Query Engine and SQL
Query Engine, etc. Classic query engine handles all non-SQL queries but SQL query
engine won't handle logical files.
Data storage
Data storage is a database, which is a collection of tables with typed columns.
SQL Server supports different data types, including primary types such as Integer, Float,
Decimal, Char (including character strings), Varchar (variable length character strings),
binary (for unstructured blobs of data), Text (for textual data) among others. The
rounding of floats to integers uses either Symmetric Arithmetic Rounding or Symmetric
Round Down (fix) depending on arguments: SELECT Round(2.5, 0) gives 3.
Microsoft SQL Server also allows user-defined composite types (UDTs) to be
defined and used. It also makes server statistics available as virtual tables and views
(called Dynamic Management Views or DMVs). In addition to tables, a database can also
contain other objects including views, stored procedures, indexes and constraints, along
with a transaction log. A SQL Server database can contain a maximum of 231 objects,
and can span multiple OS-level files with a maximum file size of 260 bytes (1 exabyte).
The data in the database are stored in primary data files with an extension .mdf.
Secondary data files, identified with a .ndf extension, are used to allow the data of a
single database to be spread across more than one file, and optionally across more than
one file system. Log files are identified with the .ldf extension
Storage space allocated to a database is divided into sequentially numbered pages,
each 8 KB in size. A page is the basic unit of I/O for SQL Server operations. A page is
marked with a 96-byte header which stores metadata about the page including the page
number, page type, free space on the page and the ID of the object that owns it. Page type
defines the data contained in the page: data stored in the database, index, allocation map
which holds information about how pages are allocated to tables and indexes, change
map which holds information about the changes made to other pages since last backup or
logging, or contain large data types such as image or text.
Buffer management
SQL Server buffers pages in RAM to minimize disk I/O. Any 8 KB page can be
buffered in-memory, and the set of all pages currently buffered is called the buffer cache.
The amount of memory available to SQL Server decides how many pages will be cached
in memory. The buffer cache is managed by the Buffer Manager. Either reading from or
writing to any page copies it to the buffer cache. Subsequent reads or writes are
redirected to the in-memory copy, rather than the on-disc version. The page is updated on
the disc by the Buffer Manager only if the in-memory cache has not been referenced for
some time. While writing pages back to disc, asynchronous I/O is used whereby the I/O
operation is done in a background thread so that other operations do not have to wait for
the I/O operation to complete. Each page is written along with its checksum when it is
Concurrency and locking
SQL Server allows multiple clients to use the same database concurrently. As
such, it needs to control concurrent access to shared data, to ensure data integrity-when
multiple clients update the same data, or clients attempt to read data that is in the process
of being changed by another client. SQL Server provides two modes of concurrency
control: pessimistic concurrency and optimistic concurrency. When pessimistic
concurrency control is being used, SQL Server controls concurrent access by using locks.
Locks can be either shared or exclusive. Exclusive lock grants the user exclusive access
to the data-no other user can access the data as long as the lock is held. Shared locks are
used when some data is being read-multiple users can read from data locked with a
shared lock, but not acquire an exclusive lock. The latter would have to wait for all
shared locks to be released.
SQLCMD is a command line application that comes with Microsoft SQL Server,
and exposes the management features of SQL Server. It allows SQL queries to be written
and executed from the command prompt. It can also act as a scripting language to create
and run a set of SQL statements as a script. Such scripts are stored as a .sql file, and are
used either for management of databases or to create the database schema during the
deployment of a database.
SQLCMD was introduced with SQL Server 2005 and this continues with SQL
Server 2012 and 2014. Its predecessor for earlier versions was OSQL and ISQL, which is
functionally equivalent as it pertains to TSQL execution, and many of the command line
parameters are identical, although SQLCMD adds extra versatility.
The OLAP Services feature available in SQL Server version 7.0 is now called MY
SQL Server Analysis Services. The term OLAP Services has been replaced with the term
Analysis Services. Analysis Services also includes a new data mining component. The
Repository component available in SQL Server version 7.0 is now called Microsoft MY
SQL Server Meta Data Services. References to the component now use the term Meta
Data Services. The term repository is used only in reference to the repository engine
within Meta Data Services.
SQL-SERVER database consist of five type of objects,
They are,
A database is a collection of data about a specific topic.
We can View a table in two ways,
a) Design View
b) Datasheet View
A) Design View
To build or modify the structure of a table, we work in the table design view. We can
specify what kind of dates will be holed.
B) Datasheet View
To add, edit or analyses the data itself, we work in table’s datasheet view mode.
A query is a question that has to be asked to get the required data. Access gathers
data that answers the question from one or more table. The data that make up the answer
is either dynast (if you edit it) or a snapshot (it cannot be edited).Each time we run a
query, we get latest information in the dynast. Access either displays the dynast or
snapshot for us to view or perform an action on it, such as deleting or updating.
A form is used to view and edit information in the database record. A form displays
only the information we want to see in the way we want to see it. Forms use the familiar
controls such as textboxes and checkboxes. This makes viewing and entering data easy.
We can work with forms in several views. Primarily there are two views, They are,
a) Design View
b) Form View
To build or modify the structure of a form, we work in form’s design view. We can add
control to the form that are bound to fields in a table or query, includes textboxes, option
buttons, graphs and pictures.
A report is used to view and print the information from the database. The report
can ground records into many levels and compute totals and average by checking values
from many records at once. Also the report is attractive and distinctive because we have
control over the size and appearance of it.
A macro is a set of actions. Each action in a macro does something, such as opening a
form or printing a report .We write macros to automate the common tasks that work
easily and save the time.
SQL procedures are characterized by many features. SQL procedures:
 Can contain SQL Procedural Language statements and features which support the

implementation of control-flow logic around traditional static and dynamic SQL

 Are supported in the entire DB2 family brand of database products in which many

if not all of the features supported in DB2 Version 9 are supported.

 Are easy to implement, because they use a simple high-level, strongly typed

 SQL procedures are more reliable than equivalent external procedures.
 Adhere to the SQL99 ANSI/ISO/IEC SQL standard.
 Support input, output, and input-output parameter passing modes.
 Support a simple, but powerful condition and error-handling model.
 Allow you to return multiple result sets to the caller or to a client application.
 Allow you to easily access the SQL STATE and SQLCODE values as special

 Reside in the database and are automatically backed up and restored.
 Can be invoked wherever the CALL statement is supported.
 Support nested procedure calls to other SQL procedures or procedures

implemented in other languages.

 Support recursion.


The following are the Testing Methodologies:

o Unit Testing.
o Integration Testing.
o User Acceptance Testing.
o Output Testing.
o Validation Testing.

Unit Testing

Unit testing focuses verification effort on the smallest unit of Software design that
is the module. Unit testing exercises specific paths in a module’s control structure to

ensure complete coverage and maximum error detection. This test focuses on each
module individually, ensuring that it functions properly as a unit. Hence, the naming is
Unit Testing.

During this testing, each module is tested individually and the module interfaces
are verified for the consistency with design specification. All important processing path
are tested for the expected results. All error handling paths are also tested.

Integration Testing

Integration testing addresses the issues associated with the dual problems of
verification and program construction. After the software has been integrated a set of high
order tests are conducted. The main objective in this testing process is to take unit tested
modules and builds a program structure that has been dictated by design.

The following are the types of Integration Testing:

1. Top Down Integration

This method is an incremental approach to the construction of program structure.
Modules are integrated by moving downward through the control hierarchy, beginning
with the main program module. The module subordinates to the main program module
are incorporated into the structure in either a depth first or breadth first manner.
In this method, the software is tested from main module and individual stubs are
replaced when the test proceeds downwards.

2. Bottom-up Integration

This method begins the construction and testing with the modules at the lowest
level in the program structure. Since the modules are integrated from the bottom up,
processing required for modules subordinate to a given level is always available and the
need for stubs is eliminated. The bottom up integration strategy may be implemented
with the following steps:

 The low-level modules are combined into clusters into clusters that
perform a specific Software sub-function.
 A driver (i.e.) the control program for testing is written to coordinate test
case input and output.
 The cluster is tested.
 Drivers are removed and clusters are combined moving upward in the
program structure

The bottom up approaches tests each module individually and then each module is
module is integrated with a main module and tested for functionality.

7.1.3 User Acceptance Testing

User Acceptance of a system is the key factor for the success of any system. The
system under consideration is tested for user acceptance by constantly keeping in touch
with the prospective system users at the time of developing and making changes
wherever required. The system developed provides a friendly user interface that can
easily be understood even by a person who is new to the system.

7.1.4 Output Testing

After performing the validation testing, the next step is output testing of the
proposed system, since no system could be useful if it does not produce the required
output in the specified format. Asking the users about the format required by them tests
the outputs generated or displayed by the system under consideration. Hence the output
format is considered in 2 ways – one is on screen and another in printed format.

7.1.5 Validation Checking

Validation checks are performed on the following fields.

Text Field:

The text field can contain only the number of characters lesser than or equal to its
size. The text fields are alphanumeric in some tables and alphabetic in other tables.
Incorrect entry always flashes and error message.

Numeric Field:

The numeric field can contain only numbers from 0 to 9. An entry of any
character flashes an error messages. The individual modules are checked for accuracy and
what it has to perform. Each module is subjected to test run along with sample data.
The individually tested modules are integrated into a single system. Testing involves
executing the real data information is used in the program the existence of any program
defect is inferred from the output. The testing should be planned so that all the
requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and
produces and output revealing the errors in the system.

Preparation of Test Data

Taking various kinds of test data does the above testing. Preparation of test data
plays a vital role in the system testing. After preparing the test data the system under
study is tested using that test data. While testing the system by using test data errors are
again uncovered and corrected by using above testing steps and corrections are also noted
for future use.

Using Live Test Data:

Live test data are those that are actually extracted from organization files. After a
system is partially constructed, programmers or analysts often ask users to key in a set of
data from their normal activities. Then, the systems person uses this data as a way to
partially test the system. In other instances, programmers or analysts extract a set of live
data from the files and have them entered themselves.

It is difficult to obtain live data in sufficient amounts to conduct extensive testing.

And, although it is realistic data that will show how the system will perform for the
typical processing requirement, assuming that the live data entered are in fact typical,
such data generally will not test all combinations or formats that can enter the system.
This bias toward typical values then does not provide a true systems test and in fact
ignores the cases most likely to cause system failure.

Using Artificial Test Data:

Artificial test data are created solely for test purposes, since they can be generated
to test all combinations of formats and values. In other words, the artificial data, which
can quickly be prepared by a data generating utility program in the information systems
department, make possible the testing of all login and control paths through the program.

The most effective test programs use artificial test data generated by persons other
than those who wrote the programs. Often, an independent team of testers formulates a
testing plan, using the systems specifications.

The package “Virtual Private Network” has satisfied all the requirements
specified as per software requirement specification and was accepted.


Whenever a new system is developed, user training is required to educate them

about the working of the system so that it can be put to efficient use by those for whom
the system has been primarily designed. For this purpose the normal working of the
project was demonstrated to the prospective users. Its working is easily understandable
and since the expected users are people who have good knowledge of computers, the use
of this system is very easy.

This covers a wide range of activities including correcting code and design errors.
To reduce the need for maintenance in the long run, we have more accurately defined the
user’s requirements during the process of system development. Depending on the
requirements, this system has been developed to satisfy the needs to the largest possible
extent. With development in technology, it may be possible to add many more features
based on the requirements in future. The coding and designing is simple and easy to
understand which will make maintenance easier.


A strategy for system testing integrates system test cases and design techniques
into a well planned series of steps that results in the successful construction of software.
The testing strategy must co-operate test planning, test case design, test execution, and
the resultant data collection and evaluation .A strategy for software testing must
accommodate low-level tests that are necessary to verify that a small source code
segment has been correctly implemented as well as high level tests that validate
major system functions against user requirements.

Software testing is a critical element of software quality assurance and represents the
ultimate review of specification design and coding. Testing represents an interesting
anomaly for the software. Thus, a series of testing are performed for the proposed
system before the system is ready for user acceptance testing.

Software once validated must be combined with other system elements (e.g.
Hardware, people, database). System testing verifies that all the elements are proper and
that overall system function performance is
achieved. It also tests to find discrepancies between the system and its original objective,
current specifications and system documentation.

In unit testing different are modules are tested against the specifications produced
during the design for the modules. Unit testing is essential for verification of the code
produced during the coding phase, and hence the goals to test the internal logic of the
modules. Using the detailed design description as a guide, important Conrail paths are
tested to uncover errors within the boundary of the modules. This testing is carried out
during the programming stage itself. In this type of testing step, each module was found
to be working satisfactorily as regards to the expected output from the module.

In Due Course, latest technology advancements will be taken into consideration.

As part of technical build-up many components of the networking system will be generic
in nature so that future projects can either use or interact with this. The future holds a lot
to offer to the development and refinement of this project.

Security a major Concern:
1. Security concerns arising because both customer data and program are residing in
Provider Premises.

2. Security is always a major concern in Open System Architectures

Data centre Security?

1. Professional Security staff utilizing video surveillance, state of the art intrusion
detection systems, and other electronic means.
2. When an employee no longer has a business need to access datacenter his
privileges to access datacenter should be immediately revoked.

3. All physical and electronic access to data centers by employees should be logged
and audited routinely.

Data Location:
1. When user uses the cloud, user probably won't know exactly where your data is
hosted, what country it will be stored in?

2. Data should be stored and processed only in specific jurisdictions as define by


3. Provider should also make a contractual commitment to obey local privacy

requirements on behalf of their customers,

4. Data-centered policies that are generated when a user provides personal or

sensitive information, that travels with that information throughout its lifetime to
ensure that the information is used only in accordance with the policy

Backups of Data:
1. Data store in database of provider should be redundantly store in multiple
physical location.

2. Data that is generated during running of program on instances is all customer

data and therefore provider should not perform backups.

3. Control of Administrator on Databases.

Data Sanitization:
1. Sanitization is the process of removing sensitive information from a storage

2. What happens to data stored in a cloud computing environment once it has passed
its user’s “use by date”
3. What data sanitization practices does the cloud computing service provider
propose to implement for redundant and retiring data storage devices as and when
these devices are retired or taken out of service.

Network Security:
1. Denial of Service: where servers and networks are brought down by a huge
amount of network traffic and users are denied the access to a certain Internet
based service.

2. Like DNS Hacking, Routing Table “Poisoning”, XDoS attacks

3. QoS Violation : through congestion, delaying or dropping packets, or through

resource hacking.

4. Man in the Middle Attack: To overcome it always use SSL

5. IP Spoofing: Spoofing is the creation of TCP/IP packets using somebody else's IP


6. Solution: Infrastructure will not permit an instance to send traffic with a source IP
or MAC address other than its own.

How secure is encryption Scheme:

1. Is it possible for all of my data to be fully encrypted?

2. What algorithms are used?

3. Who holds, maintains and issues the keys? Problem:

4. Encryption accidents can make data totally unusable.

5. Encryption can complicate availability Solution

6. The cloud provider should provide evidence that encryption schemes were
designed and tested by experienced specialists.

Information Security:
1. Security related to the information exchanged between different hosts or between
hosts and users.

2. This issues pertaining to secure communication, authentication, and issues

concerning single sign on and delegation.

3. Secure communication issues include those security concerns that arise during the
communication between two entities.

4. These include confidentiality and integrity issues. Confidentiality indicates that all
data sent by users should be accessible to only “legitimate” receivers, and
integrity indicates that all data received should only be sent/modified by
“legitimate” senders.

5. Solution: public key encryption, X.509 certificates, and the Secure Sockets Layer
(SSL) enables secure authentication and communication over computer networks.


This project provides several examples showing that knowledge mining becomes more
important in a wide range of applications. When data collection instruments become
more advanced and people rely more on computer to discover “intelligence,” opportunity
for knowledge mining field to grow is endless. Although many commercial data and text
mining software are available, there is still much room for research and education
activities to expand for advancing data mining techniques and for showing their potential
in several emerging fields such as computation-based biological and medical studies.
Hopefully, with the exposition of this article more ISyE students, faculty members and
alumni would use knowledge mining tools in their studies, research investigations and
business activities.


Arias, E., Donoho, D. L., and Huo, X. (2003), “Asymptotically Optimal Detection of
Geometric Objects by Fast Multiscale Methods.” Submitted manuscript. http://www-

Besag, J, York J, and Mollie A. (1991), “Bayesian Image Restoration with Two
Applications in Spatial Statistics,” Annals of the Institute of Statistical Mathematics,
43, 1-59.
Cressie, N. A. C. (1993). Statistics for Spatial Data (2nd edition). John Wiley: New

Gabowski, H., Lossack and Weibkopf (2001), “Automatic Classification and Creation
of Classification Systems Using Methodologies of ‘Knowledge Discovery in
Databases,” Chapter 5 (pp. 127-144) of Data Mining for Design and Manufacturing:
Methods and Applications edited by D. Braha, Kluwer Academic Publishers: New

Ganesan, R., Das, T. K., Sikdar, A., and Kumar, A. (2003), “Wavelet Based Detection
of Delamination Defect in CMP Using Nonstationary Acoustic Emission Signal,” in
review with IEEE Transactions on Semiconductor Manufacturing, 16(4), to appear.

Huo, X., and Chen, J. (2004), “Building a Cascade Detector and Applications in
Automatic Target Recognition,” Applied Optics: Information Processing, 43(2): 293-

Huo, X., Chen, J., Wang, S., and Tsui, K. (2002), “Support Vector Trees:
Simultaneously Realizing the principle of Maximal Margin and Maximal Purity.”
Research report can be obtained in

Ishino, Y., and Jin, Y. (2001), “Data Mining for Knowledge Acquisition in
Engineering Design,” Chapter 6 (pp. 145-160) of Data Mining for Design and
Manufacturing: Methods and Applications edited by D. Braha, Kluwer Academic
Publishers: New York.

Jeong, M. K., Lu, J. C., Huo, X., Vidakovic, B., and Chen, D. (2004a), “Wavelet-
based Data Reduction Techniques for Process Fault Detection,” accepted by
Technometrics. This paper can be obtained in
Lawless, J. F., Mackay, R. J., and Robinson, J. A. (1999), “Analysis of Variation
Transmission in Manufacturing Process-Part,” Journal of Quality Technology, 31,

Lu, J. C. (2001), “Methodology of Mining Massive Data Set for Improving

Manufacturing Quality/Efficiency,” Chapter 11 (pp. 255-288) of Data Mining for
Design and Manufacturing: Methods and Applications edited by D., Kluwer
Academic Publishers: New York.

Olman, V., Xu, D., and Xu, Y. (2003), “CUBIC: Identification of Regulatory Biding
Sites Through Data Clustering,” Journal of Bioinformatics and Computational
Biology, 1(1), 21-40.

Porter, A. L., Kongthon, A., and Lu, J. C. (2002), “Research Profiling – Improving the
Literature Review: Illustrated for the Case of Data Mining of Large Datasets,”
Scientometrics, 53(3), 351-370.

Schwabacher, M., Ellman and Hirsh (2001), “Learning to Set Up Numerical

Optimizations for Engineering Designs,” Chapter 4 (pp. 87-126) of Data Mining for
Design and Manufacturing: Methods and Applications edited by D. Braha, Kluwer
Academic Publishers: New York.

Spang, R., Zuzan, H., West, M., Nevins, J., Blanchette, C., and Marks, J. R. (2002),
“Prediction and Uncertainty in the Analysis of Gene Expression Profiles,” Silico
Biology, 2, 33-44.