You are on page 1of 43

AN APPROACH TO CATEGORIZATION OF TEXT IN WEBSITES

USING PARALLEL SEARCH

BAKTAVATCHALAM.G (08MW03)

MASTER OF ENGINEERING

Branch: SOFTWARE ENGINEERING

of Anna University

May 2009

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


PSG COLLEGE OF TECHNOLOGY
(Autonomous Institution)
COIMBATORE – 641 004
PSG COLLEGE OF TECHNOLOGY
(Autonomous Institution)
COIMBATORE – 641 004

AN APPROACH TO CATEGORIZATION OF TEXT IN WEBSITES USING


PARALLEL SEARCH

Bona fide record of work done by

BAKTAVATCHALAM.G (08MW03)

MASTER OF ENGINEERING

Branch: COMPUTER SCIENCE AND ENGINEERING


of Anna University, Coimbatore.

May 2009
Acknowledgement

ACKNOWLEDGEMENT

We wish to express our sincere gratitude to our respected Principal Dr. R.


Rudramoorthy for having given us the opportunity to undertake our project.

We also wish to express our sincere thanks to Dr. S. N. Sivanandam, Professor


and Head of the Department of Computer Science and Engineering, for his
encouragement and support that he extends towards our project work.

We extend our sincere thanks to our internal guide, Mrs. D. Indumathi, Asst.
Professor, Department of Computer Science and Engineering, for his guidance and
help rendered for the successful completion of our project.

i
Contents

CONTENTS

CHAPTER Page No.

Synopsis………………….………………………………………………..…………….. .(i)
List of Figures.………….………………………………………………...…………….. .(ii)
List of Tables.…………………………………………………………………………….(iii)
1. INTRODUCTION.……...…………………………………………………………... .1
1.1. Problem Definition 1
1.2. Objective of the Project 1
1.3. Significance of the Project 1
1.4. Outline of the Project 1
2. SYSTEM STUDY..…….……………………..……………………………………...3
2.1. Proposed System 3
3. SYSTEM ANALYSIS..…….……………………..………………………………….4
3.1 Requirement Analysis 4
3.2 Feasibility Study 4
4. SYSTEM IMPLEMENTATION.………………..…………………………………...10
5.1 Server Module 10
5.2 Parser Module 11
5. TESTING……………………….………………..……………………………………12
6.1 Unit Testing 12
6.2 Integration Testing 14
6.3 Sample Test Cases 15
6. SNAPSHOT.…..……………….………………..…………………………………. 16
7.1 Finding Document Category 16
7.2 Finding keyword Document 16
CONCLUSIONS………………..………………………………………….……….……..17
FUTURE ENHANCEMENTS..…………………………………………………….……. .18
BIBLIOGRAPHY...…………………………………………………………….………….19

iii
Synopsis

SYNOPSIS

In this project, we search a given set of keywords in categorized


documents. Searching is done after the categorization is completed and categories of
given documents are available.

Here we do two separate operations. First we generate the categories and


its related categories. After that we give required web site links to find categories of
those links. Here each website contents are parsed into keywords list and using those
keys the corresponding category is determined. Now the documents and its categories
are computed to search using keys.

Second, we give keywords to search engine to search the document and


its corresponding category. If keyword is composite of multiple keywords then all keys
are searched and its corresponding document and its corresponding category will be
retrieved. The category contains name, keys, and weights for corresponding keys.
Category is sorted using those weights and key occurrences.

i
List of Figures

LIST OF FIGURES

FIGURE NO LIST OF FIGURES PAGE NO.

Fig: 2.1 System Architecture 3

ii
List of Tables

LIST OF TABLES

TABLE NO NAME PAGE NO.

Table 6.1 Sample Test Cases 15

iii
Introduction Chapter 1

CHAPTER 1

INTRODUCTION

This chapter provides a brief overview of the problem definition, objectives and
significance of the project and an outline of the report.

1.1 PROBLEM DEFINITION


Searching a given keyword set in a given website set and categorizes the
websites. If a keyword set is given then it will determine the documents which are most
relevant to that keyword set and also the category which it belongs to that keyword set

1.2 OBJECTIVE OF THE PROJECT


Most of the users are interested in the website contents of their desired
information. Also users want the information location where that info is found. So this
project gives a solution for user that user can search where a particular text paragraph is
found in a given set of websites and corresponding category.

1.3 SIGNIFICANCE OF THE PROJECT


With the enormous growth in information on the Internet, there is a corresponding
need for tools that enable fast and efficient searching, browsing and delivery of textual
data. The concurrent execution will greatly simplify the complexity of the search.

1.4 OUTLINE OF THE PROJECT


The rest of the report is structures as follows. Chapter 2 provides a detailed study
of the existing system and the basic ideas of the proposed system. Chapter 3 discusses
the requirements for the development of the system and an analysis on the feasibility of
the system. Chapter 4 presents the overall design of the system. Chapter 5 discusses

1
Introduction Chapter 1

the implementation details. Chapter 6 explains various testing procedures conducted on


the system. Chapter 7 contains the snapshot of various forms in our system. The last
section summarizes the project.

2
System Study Chapter 2

CHAPTER 2

SYSTEM STUDY
This chapter elucidates the existing system and a brief description of the
proposed system.

2.1 PROPOSED SYSTEM


In our project, we search a given set of keywords in categorized
documents. Searching is done after the categorization is completed and categories of
given documents are available. Here we do two separate operations. First we generate
the categories and its related categories. After that we give required web site links to find
categories of those links. Here each website contents are parsed into keywords list and
using those keys the corresponding category is determined. Now the documents and its
categories are computed to search using keys. Second, we give keywords to search
engine to search the document and its corresponding category. If keyword is composite
of multiple keywords then all keys are searched and its corresponding document and its
corresponding category will be retrieved. The category contains name, keys, and
weights for corresponding keys. Category is sorted using those weights and key
occurrences.

Figure 2.1 Keywords

Document Websites
Finder Categorizer

Documents +
Search Keyword Categories
3
System Analysis Chapter 3

CHAPTER 3

SYSTEM ANALYSIS
This section describes the hardware and software specifications for the
development of the system and an analysis on the feasibility of the system.

3.1 REQUIREMENT ANALYSIS


3.1.1 Software Requirements
After experimenting with various commercial software available and analyzing
the Pros and Cons of the software, the following are chosen.
• Operating System – Platform Independent
• Programming Languages – Java 1.6+
• Front End - Java

3.1.2 Hardware Requirements


The Hardware requirements of the proposed system are as follows:
• Pentium-III machine & above
• RAM-256 MB
• Hard Disk with a Capacity of 10 GB

3.2 FEASIBILITY ANALYSIS


Feasibility deals with step-by-step analysis of the system. Analysis showed that
this project was feasible in all respects. Three kinds of feasibility factors are considered:

• Economic Feasibility

• Technical Feasibility

• Operational Feasibility

4
System Analysis Chapter 3

3.2.1 Economic Feasibility

The system is developed only using those softwares that are very well used in
the market, so there is no need for installation of new softwares. Hence, the cost
incurred towards this project is negligible

3.2.2 Technical Feasibility

3.2.2.1 Searching
The main aim of our project is to search a specific set of keywords in a specific
set of websites only.

3.2.2.2 Categorizing
Next important thing that must be done in our project is to categorize the
documents, so that we can able to search for a specific keyword set.

3.2.3 Operational Feasibility


The functions needed to be performed by the system are all valid and without
any conflicts. All functions and constraints specified in the requirements are completely
operational. The requirements stated are realistically testable.
The requirements are adaptable to changes with out any large-scale effects on
other system requirements. The system is capable of accommodating future
requirements if they arise.

5
System Design Chapter 4

CHAPTER 4

SYSTEM DESIGN

This chapter describes the functional decomposition of the system and illustrates
the movement of data between external entities, the processes and the data stores
within the system, with the help of data flow diagrams.

4.1 USE CASE DIAGRAM

Actors User, Client, Server

IP List, URL List, Keywords, Specification, Send Jobs Process Jobs,


Usecases
Searching, Results

IP List URL List Keywords

Specification Server
User

Send Jobs

Searching

Client
Process Jobs

Results

6
System Design Chapter 4

4.2 CLASS DIAGRAM

ServerRead
S : Socket
ServerManager
S : Socket dataFS()
kN : int ServerGUI
key[] : String
URL : String main() ServerWrite
S : Socket

send()
ClientManager
S : Socket
kN : int ClientRead
key[] : String S : Socket
URL : String
ClientGUI
dataFS()
search()
parseURL() main()
dataFS() ClientWrite
S : Socket

send()

4.3 SEQUENCE DIAGRAM

User Server Client(s)

1: IP List

2: Keywords

3: URL List

4: Init Process

5: Allocate Jobs

6: Distribute Jobs

7: Process Searching

8: Result
9: Combined Result

7
System Design Chapter 4

4.4 COLLABORATION DIAGRAM


5: Allocate Jobs
1: IP List
2: Keywords
3: URL List
4: Init Process
User Server

9: Combined Result
6: Distribute Jobs

7: Process Searching
8: Result

Client(s)

4.5 STATE CHART / ACTIVITY DIAGRAM

Serv er Client

Read IPList, URL List Receive all


and Keywords Data

Search each keyword


count in each URL

Send Keywords To Send URL


all IP List to all IP
Compute All keywords
Count from all URL's

No

Results
Found? Results
Yes

Display
Results

8
System Design Chapter 4

4.6 DEPLOYMENT DIAGRAM

Client(s)
Server

Keywords
IP List URL List

9
Implementation Chapter 5

CHAPTER 5

IMPLEMENTATION

This phase is broken up into two phases: Development and Implementation. The
individual system components are built during the development period. Programs are
written and tried by users.
During Implementation, the components built during development are put into
operational use.
In the development phase of our system, the following system components were
built.
• Server module
• Parser module
The Server & Parser module is developed using Java.

5.1 Server Module


This module contains following sub-modules,
• Load Details
• Categorizing
• Searching
5.1.1 Load Details
In this module we load Categories & its related categories, Documents & its
categories, Categories & its Keys with Weights.
5.1.2 Categorizing
In this module we categorize the given document using key set parsed from that
document and corresponding weights relevant to available categories.
5.1.3 Searching
In this module we search documents and its category using given key set.

10
Implementation Chapter 5

5.2 Parser Module


This module contains following sub-modules,
• Load Module
• URL Content Grabber Module
5.2.1 Load Module
In this module we load keywords from server and then retrieve URL to begin
searching.
5.2.1 URL Content Grabber Module
Whenever a URL is coming from server then the parser makes connection to that
URL and retrieves the contents to begin searching and after it collects key sets from that
site.

11
Testing Chapter 6

CHAPTER 6

TESTING
This chapter explains the various testing procedures conducted on the system.
Testing is a process of executing a program with the intent of finding an error. A
successful test is one that uncovers an as yet undiscovered error. A testing process
cannot show the absence of defects but can only show that software errors are present.
It ensures that defined input will produce actual results that agree with the required
results. A good testing methodology should include
• Clearly define testing roles, responsibilities and procedures
• Establish consistent testing process
• Streamline testing requirements
• Overcome “requirements slow me down” mentality
• Common sense process approach
• Use some elements of existing Process
• Not an attempt to replace, rewrite or redefine Process
• To find defects early and to give good time to developers for bug fixes
• Independent perspective in testing

Some of the testing principles used in this project are:


• Unit Testing
• Integration Testing

6.1 UNIT TESTING


Unit testing is a strategy by which individual components, which make up the
system, are tested first to ensure that system works up to the desired extent. It focuses
on the verification effort on the smallest unit of the software design i.e. module. Various
modules of the system are tested to see whether they perform their intended functions.
Using procedural design description, important control paths are tested to uncover the

12
Testing Chapter 6

errors with in the boundary of the module. While accepting a connection using specified
functions we go for unit testing in their respective modules. The unit test is normally a
white box test (a testing method in which the control structure of the procedural design is
used to derive test cases).

6.1.1 Process Objectives


To test every unit of the software in isolation before integrating it with other units.

6.1.2 Definition of Unit


A unit is a module as identified during size estimation process with a size
estimate that does not exceed 1000LOC.
For GUI applications each screen will be a unit.
If the size estimate for a unit exceeds 1000 LOC and it is not feasible to break it
into smaller logically independent units that can be tested in isolation, the project lead in
concurrence with the SQA can decide to define this as a unit.

6.1.3 Entry Criteria


The entry criteria for this process are the following:
• Unit completed
• Unit peer reviewed

6.1.4 Exit Criteria


The exit criteria for this process are the following:
• Unit test cases executed
• Any defects that are identified during unit testing and that are not fixed before the
unit enters component testing is listed in the test report and verified
• 100% statement coverage
If unit will be tested before code review of unit, this must be identified in the
project plan. In these projects the developer will self-review (desk check) the code
before unit testing.
In cases of exception handling of error conditions that are difficult to generate,
thereby making it impossible to achieve 100% statement coverage, the code should be
formally reviewed with this additional criteria

13
Testing Chapter 6

6.2 INTEGRATION TESTING


The integration testing is a systematic technique for constructing the program
structure while conducting tests to uncover errors associated with interfacing. It is a type
of testing by which the individual modules of the system are combined and tested
whether they work properly as a whole. The objective is to take unit test modules and
build a program that has been dictated by the design. Integration testing can be either
‘Incremental’ or ‘Non-Incremental’.
The objective of the integration testing is to help engineers plan and execute the
component and Integration testing for their respective projects.
Integration testing should include the following objectives:
• Performed by the product group/Dev test team after feature complete
• Determines that all product components on a list of specific platforms function
successfully together (The List specified in Master test plan)
• Performed in a basic product / platform environment (Basic environment
specified in Master test plan)
• Tests the product functionality against the specification
• Tests functionality of fake languages with sample single and double byte
languages
• Tests scaling to an acceptable minimum level as called out in the master test
plan
• Tests performance, reliability to an acceptable level as called out in the master
test plan
• Final integration tests done after all components are integrated, with the build in
production format
The tasks of the project have been integrated and the functioning of the entire
system has been found to be satisfactory. The functionality of the entire system has
been subjected to a series of tests and all the modules have been found to interoperate
properly.
Finally the integration testing was performed on the integrated system and found
to work properly.

14
Testing Chapter 6

6.3 SAMPLE TEST CASES


The following are the some of the sample test cases employed along with the
test results have been described in the table below.

Table 6.1 Sample Test Cases

Test Description Result


Is Server stable for running more than one key set? OK
Is parser returns the results properly? OK
Is searching is done correctly? OK
Is Server takes Lower Resources? OK
Is the result is got over a less time? OK

15
Snapshot Chapter 7

CHAPTER 7

SNAPSHOT
This chapter contains the snapshot of various forms in our system.

7.1 Finding Category of given document

16
Snapshot Chapter 7

7.2 Finding the Document & its Category using given keyword

17
Conclusion

CONCLUSION

Thus the analysis, design and implementation of text categorization and


searching are done successfully. So that the user can able to do searching of a set of
keywords in a list of websites and the user can able to view the each keyword count for
a particular website. This searching is very useful for crawl the websites with particular
perspective view of specific content. Also the search is running concurrently, so we can
get higher performance.

17
Future Enhancements

FUTURE ENHANCEMENTS
Currently we have flat classification scheme to find categories, in future it will
extended to hierarchical tree structure classification to reduce the time complexity and
improve relevancy. Currently we give set of websites for classification, in future
classification is done by automatic parsing of sites.

18
Bibliography

BIBLIOGRAPHY

• [Lorenz 1994] Lorenz, L. Kidd, J. Object Oriented Software Metrics, Prentice Hall 1994,
ISBN 0-13-179292-X
• Saturnino Luz, Implementing a Text Categorization System: a step-by-step tutorial

• A. McCallum and K. Nigam. A comparison of event models for naive Bayes text
classification. In AAAI/ICML-98 Workshop on Learning for Text Categorization, pages 41–
48. AAAI Press, 1998.

• Y. Yang and J. O. Pedersen. A comparative study on feature selection in text


categorization.

• In D. H. Fisher, editor, Proceedings of ICML-97, 14th International Conference on


Machine Learning, pages 412–420, Nashville, 1997. Morgan Kaufmann Publishers.

• Java Network Programming, O'Reilly & Associates, Inc.,, Second Edition

• Herbert Schildt ., and Patrick Naughton , 2001,“Java2: The Complete Reference “, Fourth
Edition , Tata McGraw-Hill Publishing Company Limited .

Websites
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://paul.luminos.nl/documents/show_document.php?d=197

19
Appendix

APPENDIX

SOURCE CODE LISTINGS

This chapter provides source code listings.

INPUT FILES
IP.TXT
2
127.0.0.1
127.0.0.1
127.0.0.1

JOBS.TXT
0
5
www.google.co.in
www.yahoo.com
www.chennaionline.com
www.psgtech.edu
www.psgtech.edu

KEY.TXT
4
page
href
www
Tamil

OUTPUT (In Server)

Sockets created
Keys distributed
Is Found:true
Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.google.co.in---page---1
Is Found:true

20
Appendix

Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.google.co.in---href---36
Is Found:true
Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.google.co.in---www---18
Is Found:true
Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.google.co.in---Tamil---1
Is Found:true
Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.yahoo.com---page---1
Is Found:true
Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.yahoo.com---href---48
Is Found:true
Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.yahoo.com---www---5
Is Found:false
Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.yahoo.com---Tamil---0

SERVER
/*
* ServerGUI.java
*
* Created on November 2, 2008, 3:09 PM
*/
import java.io.*;
import java.util.*;
import javax.swing.*;
/**
*
* @author SuperStar
*/
interface ServerI
{
public void setErr(String err);
public void setInfo(String info);
}

public class ServerGUI extends javax.swing.JFrame implements ServerI {


String[] ip;
int ipN=0,rN=0,jN=0,jT=0,kN=0;
String[] jobs;
String[] rank;
String[] key;
ServerManager SM;

/** Creates new form ServerGUI */


public ServerGUI() {
initComponents();
this.jTextArea2.setText("Err Stream:");
this.jList1.removeAll();

21
Appendix

// this.jList2.removeAll();
this.jList3.removeAll();
(new MessageBox("welcome To SuperStar's Network!")).setVisible(true);
}

/** This method is called from within the constructor to


* initialize the form.
* WARNING: Do NOT modify this code. The content of this method is
* always regenerated by the Form Editor.
*/
// <editor-fold defaultstate="collapsed" desc="Generated Code">//GEN-BEGIN:initComponents
private void initComponents() {

jScrollPane1 = new javax.swing.JScrollPane();


jList1 = new javax.swing.JList();
jLabel1 = new javax.swing.JLabel();
jButton1 = new javax.swing.JButton();
jScrollPane3 = new javax.swing.JScrollPane();
jList3 = new javax.swing.JList();
jLabel2 = new javax.swing.JLabel();
jScrollPane2 = new javax.swing.JScrollPane();
jTextArea1 = new javax.swing.JTextArea();
jButton3 = new javax.swing.JButton();
jScrollPane4 = new javax.swing.JScrollPane();
jTextArea2 = new javax.swing.JTextArea();
jButton2 = new javax.swing.JButton();
jScrollPane5 = new javax.swing.JScrollPane();
jTextArea3 = new javax.swing.JTextArea();

setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE);
setTitle("Server");

jList1.setModel(new javax.swing.AbstractListModel() {
String[] strings = { "Item 1", "Item 2", "Item 3", "Item 4", "Item 5" };
public int getSize() { return strings.length; }
public Object getElementAt(int i) { return strings[i]; }
});
jScrollPane1.setViewportView(jList1);

jLabel1.setText("Clients IP :");

jButton1.setText("Load Details");
jButton1.addActionListener(new java.awt.event.ActionListener() {
public void actionPerformed(java.awt.event.ActionEvent evt) {
jButton1ActionPerformed(evt);
}
});

jList3.setModel(new javax.swing.AbstractListModel() {
String[] strings = { "Item 1", "Item 2", "Item 3", "Item 4", "Item 5" };
public int getSize() { return strings.length; }
public Object getElementAt(int i) { return strings[i]; }
});
jScrollPane3.setViewportView(jList3);

jLabel2.setText("Clients Rank :");

22
Appendix

jTextArea1.setColumns(20);
jTextArea1.setEditable(false);
jTextArea1.setLineWrap(true);
jTextArea1.setRows(5);
jTextArea1.setWrapStyleWord(true);
jTextArea1.setOpaque(false);
jScrollPane2.setViewportView(jTextArea1);

jButton3.setText("Exit");
jButton3.addActionListener(new java.awt.event.ActionListener() {
public void actionPerformed(java.awt.event.ActionEvent evt) {
jButton3ActionPerformed(evt);
}
});

jTextArea2.setColumns(20);
jTextArea2.setRows(5);
jScrollPane4.setViewportView(jTextArea2);

jButton2.setText("Process");
jButton2.addActionListener(new java.awt.event.ActionListener() {
public void actionPerformed(java.awt.event.ActionEvent evt) {
jButton2ActionPerformed(evt);
}
});

jTextArea3.setColumns(20);
jTextArea3.setRows(5);
jScrollPane5.setViewportView(jTextArea3);

javax.swing.GroupLayout layout = new javax.swing.GroupLayout(getContentPane());


getContentPane().setLayout(layout);
layout.setHorizontalGroup(
layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addGroup(layout.createSequentialGroup()
.addContainerGap()
.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addGroup(javax.swing.GroupLayout.Alignment.TRAILING, layout.createSequentialGroup()
.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addComponent(jLabel1)
.addComponent(jScrollPane1, javax.swing.GroupLayout.DEFAULT_SIZE, 330,
Short.MAX_VALUE))
.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addComponent(jLabel2)
.addComponent(jScrollPane3, javax.swing.GroupLayout.PREFERRED_SIZE, 333,
javax.swing.GroupLayout.PREFERRED_SIZE)))
.addGroup(layout.createSequentialGroup()
.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.TRAILING)
.addComponent(jScrollPane5, javax.swing.GroupLayout.PREFERRED_SIZE, 195,
javax.swing.GroupLayout.PREFERRED_SIZE)
.addComponent(jScrollPane4, javax.swing.GroupLayout.PREFERRED_SIZE, 195,
javax.swing.GroupLayout.PREFERRED_SIZE))
.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)

23
Appendix

.addComponent(jScrollPane2, javax.swing.GroupLayout.DEFAULT_SIZE, 371,


Short.MAX_VALUE)
.addGap(6, 6, 6)
.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addComponent(jButton2, javax.swing.GroupLayout.DEFAULT_SIZE, 91,
Short.MAX_VALUE)
.addGroup(layout.createSequentialGroup()
.addGap(10, 10, 10)
.addComponent(jButton3, javax.swing.GroupLayout.PREFERRED_SIZE, 60,
javax.swing.GroupLayout.PREFERRED_SIZE))
.addComponent(jButton1, javax.swing.GroupLayout.DEFAULT_SIZE,
javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE))))
.addContainerGap())
);
layout.setVerticalGroup(
layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addGroup(layout.createSequentialGroup()
.addGap(11, 11, 11)
.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.TRAILING)
.addGroup(layout.createSequentialGroup()
.addComponent(jLabel2)
.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
.addComponent(jScrollPane3, javax.swing.GroupLayout.PREFERRED_SIZE, 88,
javax.swing.GroupLayout.PREFERRED_SIZE))
.addGroup(layout.createSequentialGroup()
.addComponent(jLabel1)
.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
.addComponent(jScrollPane1, javax.swing.GroupLayout.PREFERRED_SIZE, 88,
javax.swing.GroupLayout.PREFERRED_SIZE)))
.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addComponent(jScrollPane2, javax.swing.GroupLayout.DEFAULT_SIZE, 104,
Short.MAX_VALUE)
.addGroup(layout.createSequentialGroup()
.addComponent(jButton1)
.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
.addComponent(jButton3)
.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
.addComponent(jButton2))
.addGroup(layout.createSequentialGroup()
.addComponent(jScrollPane4, javax.swing.GroupLayout.PREFERRED_SIZE, 49,
javax.swing.GroupLayout.PREFERRED_SIZE)
.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
.addComponent(jScrollPane5, javax.swing.GroupLayout.PREFERRED_SIZE, 49,
javax.swing.GroupLayout.PREFERRED_SIZE)))
.addContainerGap())
);

pack();
}// </editor-fold>//GEN-END:initComponents

private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {//GEN-


FIRST:event_jButton1ActionPerformed
// TODO add your handling code here:
_getIPList();
_getRankList();

24
Appendix

_getJobs();
_getKeyList();
}//GEN-LAST:event_jButton1ActionPerformed

private void jButton3ActionPerformed(java.awt.event.ActionEvent evt) {//GEN-


FIRST:event_jButton3ActionPerformed
// TODO add your handling code here:
this.dispose();
System.exit(0);
}//GEN-LAST:event_jButton3ActionPerformed

private void jButton2ActionPerformed(java.awt.event.ActionEvent evt) {//GEN-


FIRST:event_jButton2ActionPerformed
// TODO add your handling code here:
SM=new ServerManager(ipN,rN,jN,kN,ip,jobs,rank,key,this);
}//GEN-LAST:event_jButton2ActionPerformed

/**
* @param args the command line arguments
*/
public static void main(String args[]) {
java.awt.EventQueue.invokeLater(new Runnable() {
public void run() {
new ServerGUI().setVisible(true);
}
});
}

// Variables declaration - do not modify//GEN-BEGIN:variables


private javax.swing.JButton jButton1;
private javax.swing.JButton jButton2;
private javax.swing.JButton jButton3;
private javax.swing.JLabel jLabel1;
private javax.swing.JLabel jLabel2;
private javax.swing.JList jList1;
private javax.swing.JList jList3;
private javax.swing.JScrollPane jScrollPane1;
private javax.swing.JScrollPane jScrollPane2;
private javax.swing.JScrollPane jScrollPane3;
private javax.swing.JScrollPane jScrollPane4;
private javax.swing.JScrollPane jScrollPane5;
private javax.swing.JTextArea jTextArea1;
private javax.swing.JTextArea jTextArea2;
private javax.swing.JTextArea jTextArea3;
// End of variables declaration//GEN-END:variables

//
public void _getIPList()
{
this.jList1.removeAll();
try
{

BufferedReader in=new BufferedReader(new FileReader("ip.txt"));


ipN=Integer.parseInt(in.readLine());
ip=new String[ipN];

25
Appendix

for(int i=0;i<ipN;i++)
{
ip[i]=in.readLine();
}
in.close();
this.jList1.setListData(ip);
this.jButton1.setEnabled(false);
}
catch(Exception e)
{
setErr(e.getMessage());
}
}
//
public void _getRankList()
{
this.jList3.removeAll();
try
{

BufferedReader in=new BufferedReader(new FileReader("rank.txt"));


rN=Integer.parseInt(in.readLine());
rank=new String[rN];
for(int i=0;i<rN;i++)
{
rank[i]=in.readLine();
}
in.close();
this.jList3.setListData(rank);
}
catch(Exception e)
{
setErr(e.getMessage());
}
}
//
public void _getKeyList()
{
this.jTextArea3.setText("");
try
{

BufferedReader in=new BufferedReader(new FileReader("key.txt"));


kN=Integer.parseInt(in.readLine());
key=new String[kN];
for(int i=0;i<kN;i++)
{
key[i]=in.readLine();
this.jTextArea3.setText(this.jTextArea3.getText()+"\n"+key[i]);
}
in.close();
//this.jList3.setListData(rank);
}
catch(Exception e)
{
setErr(e.getMessage());

26
Appendix

}
}
//
public void _getJobs()
{
this.jTextArea2.setText("");
try
{
BufferedReader in=new BufferedReader(new FileReader("jobs.txt"));
jT=Integer.parseInt(in.readLine());
this.jTextArea2.setText("Job Type:"+jT);
switch(jT)
{
case 0:
jN=Integer.parseInt(in.readLine());
jobs=new String[jN];
for(int i=0;i<jN;i++)
{
jobs[i]=in.readLine();
this.jTextArea2.setText(this.jTextArea2.getText()+"\n"+jobs[i]);
}
break;
}
in.close();
}
catch(Exception e)
{
setErr(e.getMessage());
}
}
//
public void setErr(String err)
{
this.jTextArea1.setText(this.jTextArea1.getText()+"\n"+err);
System.out.println(err);
}
public void setInfo(String info)
{
setErr(info);
}
}

/**
*
* @author SuperStar
*/
import java.net.*;
import java.io.*;

interface ServerIF
{
final int PORT=5678;
public void dataFC(String data);
}

27
Appendix

public class ServerManager extends Thread implements ServerIF


{
String IP[],R[],J[],K[];
int rN,ipN,jN,kN;
Socket[] sock;
ServerWriteThread[] SWT;
ServerReadThread[] SRT;
ServerI SI=null;

public ServerManager(int i,int r,int j,int k,String[] ip1,String[] j1,String[] r1,String[] k1,ServerI si)
{
rN=r;
ipN=i;
jN=j;
kN=k;
IP=ip1;
J=j1;
R=r1;
K=k1;
SI=si;
start();
}

public void run()


{
try
{
sock=new Socket[ipN];
SWT=new ServerWriteThread[ipN];
SRT=new ServerReadThread[ipN];
//SI.setInfo("ipn:"+ipN);
for(int i=0;i<ipN;i++)
{
sock[i]=new Socket(IP[i],5678);
//SI.setInfo("ip:"+IP[i]);
SWT[i]=new ServerWriteThread(sock[i],SI,this);
SRT[i]=new ServerReadThread(sock[i],SI,this);
//SI.setInfo("soc:"+sock[i].toString());
}
SI.setInfo("Sockets created");
_split();
}
catch(Exception e1)
{
SI.setErr("Sock Cre:"+e1.toString());
}
}

public void _split()


{
//java.util.Arrays.sort(R);
for(int i=0;i<ipN;i++)
{
SWT[i].send(""+kN);
//SI.setInfo(""+kN);
}

28
Appendix

for(int i=0;i<ipN;i++)
{
for(int j=0;j<kN;j++)
{
SWT[i].send(K[j]);
//SI.setInfo(K[j]);
}
}
//
SI.setInfo("Keys distributed");
for(int i=0,j=0;i<jN;i++)
{
SWT[j].send(J[i]);
//SI.setInfo(J[i]);
if(j<ipN-1)
j++;
else
j=0;
}
}
public void dataFC(String data)
{
SI.setInfo(data);
}
public void _quit()
{
//
}
}
//////////////
class ServerWriteThread
{
Socket S;
ServerI SI=null;
ServerIF SIF;
public ServerWriteThread(Socket s,ServerI si,ServerIF sif)
{
SIF=sif;
SI=si;
S=s;
//SI.setInfo(s.toString());
}
public void send(String msg)
{
try
{
//SI.setInfo(msg);
PrintWriter out=new PrintWriter(new BufferedWriter(new
OutputStreamWriter(S.getOutputStream())),true);
out.println(msg);
}
catch(Exception e3)
{
SI.setErr(e3.getMessage());
}
}

29
Appendix

}
//////////////
class ServerReadThread extends Thread
{
Socket S;
ServerI SI=null;
ServerIF SIF;
public ServerReadThread(Socket s,ServerI si,ServerIF sif)
{
S=s;
SIF=sif;
SI=si;
//SI.setInfo(s.toString());
start();
}
public void run()
{
try
{
BufferedReader in=new BufferedReader(new InputStreamReader(S.getInputStream()));
while(true)
{
//PrintWriter out=new PrintWriter(new BufferedWriter(new
OutputStreamWriter(os.getOutputStream())),true);
SIF.dataFC(in.readLine());
}
}
catch(Exception e2)
{
SI.setErr(e2.getMessage());
}
}
}

/*
* MessageBox.java
*
* Created on November 2, 2008, 9:15 PM
*/
/**
*
* @author SuperStar
*/
public class MessageBox extends javax.swing.JFrame {
String MSG="SuperStar";
/** Creates new form MessageBox */
public MessageBox(String msg) {
MSG=msg;
initComponents();
this.jTextArea1.setText(MSG);
}

/** This method is called from within the constructor to


* initialize the form.
* WARNING: Do NOT modify this code. The content of this method is

30
Appendix

* always regenerated by the Form Editor.


*/
// <editor-fold defaultstate="collapsed" desc="Generated Code">//GEN-BEGIN:initComponents
private void initComponents() {

jButton1 = new javax.swing.JButton();


jScrollPane1 = new javax.swing.JScrollPane();
jTextArea1 = new javax.swing.JTextArea();

setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE);
setTitle("MessageBox");
setAlwaysOnTop(true);
setBackground(new java.awt.Color(183, 226, 252));
setForeground(new java.awt.Color(0, 0, 0));

jButton1.setText("OK");
jButton1.addActionListener(new java.awt.event.ActionListener() {
public void actionPerformed(java.awt.event.ActionEvent evt) {
jButton1ActionPerformed(evt);
}
});

jTextArea1.setColumns(20);
jTextArea1.setRows(5);
jTextArea1.setOpaque(false);
jScrollPane1.setViewportView(jTextArea1);

javax.swing.GroupLayout layout = new javax.swing.GroupLayout(getContentPane());


getContentPane().setLayout(layout);
layout.setHorizontalGroup(
layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addGroup(javax.swing.GroupLayout.Alignment.TRAILING, layout.createSequentialGroup()
.addComponent(jScrollPane1, javax.swing.GroupLayout.DEFAULT_SIZE, 315,
Short.MAX_VALUE)
.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
.addComponent(jButton1)
.addContainerGap())
);
layout.setVerticalGroup(
layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addComponent(jScrollPane1, javax.swing.GroupLayout.PREFERRED_SIZE, 46,
javax.swing.GroupLayout.PREFERRED_SIZE)
.addGroup(layout.createSequentialGroup()
.addContainerGap()
.addComponent(jButton1))
);

pack();
}// </editor-fold>//GEN-END:initComponents

private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {//GEN-


FIRST:event_jButton1ActionPerformed
// TODO add your handling code here:
this.dispose();
}//GEN-LAST:event_jButton1ActionPerformed

31
Appendix

// Variables declaration - do not modify//GEN-BEGIN:variables


private javax.swing.JButton jButton1;
private javax.swing.JScrollPane jScrollPane1;
private javax.swing.JTextArea jTextArea1;
// End of variables declaration//GEN-END:variables

CLIENT

/**
*
* @author SuperStar
*/
import java.io.*;
import java.net.*;
import java.util.*;

public class ClientGUI {


public static void main(String[] s) throws Exception
{
ServerSocket SS=new ServerSocket(5678);
new ClientManager(SS.accept());
}

/////////
interface ClientIF
{
final int PORT=5678;
public void dataFS(String s);
public void setErr(String err);
public void setInfo(String info);
public void setKLen(int kn);
public void setKeys(String[] k);
}
////////
class ClientManager implements ClientIF
{
Socket S;
ClientWriteThread CWT;
ClientReadThread CRT;
int kN=0;
String[] key;
String URL;

public ClientManager(Socket s)
{
S=s;
//setInfo(s.toString());
CWT=new ClientWriteThread(S,this);
CRT=new ClientReadThread(S,this);
}

32
Appendix

//
public void _search(String src,String key)
{
//
//java.util.Scanner ss=new java.util.Scanner(src);
//StringTokenizer ss=new StringTokenizer(src,key,true);
int c=0,i=0,j=-1;
while(i<src.length())
{
if((j=src.indexOf(key,(j+1)))!=-1)
++c;
else
break;
//ss.next(key);
//System.out.println(c);
++i;
}
CWT.send("Is Found:"+src.contains(key));
CWT.send(S.toString()+"\n:"+URL+"---"+key+"---"+c);
//setInfo(URL+"---"+key+"---"+c);
}
//
public String _parseURL(String u)
{
String r="";
try
{
URL url=new URL("http",u,"/");
URLConnection con=url.openConnection();
con.connect();
InputStream in=con.getInputStream();
int ch=-1;
while((ch=in.read())!=-1)
{
r+=((char)ch);
}
in.close();
System.out.println(r);
}
catch(Exception e1)
{
setErr("URL Err:"+e1.toString());
}
//setInfo(r);
return r;
}
//
public void dataFS(String s)
{
URL=s;
//setInfo(s);
for(int i=0;i<kN;i++)
_search(_parseURL(s),key[i]);
}

public void setErr(String err)

33
Appendix

{
System.out.println(err);
}
public void setInfo(String info)
{
setErr(info);
}
public void setKLen(int kn)
{
kN=kn;
}
public void setKeys(String[] k)
{
key=k;
}
}
///////////
class ClientWriteThread
{
Socket S;
ClientIF CIF;
public ClientWriteThread(Socket s,ClientIF cif)
{
CIF=cif;
S=s;
//CIF.setInfo(S.toString());
}
public void send(String msg)
{
try
{
//CIF.setInfo(msg);
PrintWriter out=new PrintWriter(new BufferedWriter(new
OutputStreamWriter(S.getOutputStream())),true);
out.println(msg);
}
catch(Exception e3)
{
CIF.setErr("Send:"+e3.getMessage());
}
}
}
//////////////
class ClientReadThread extends Thread
{
Socket S;
//ServerI SI=null;
ClientIF CIF;
int kN=0;
String[] key;
public ClientReadThread(Socket s,ClientIF cif)
{
S=s;
CIF=cif;
//SI=si;
//CIF.setInfo(s.toString());

34
Appendix

//CIF.setInfo(""+kN);
start();
}
public void run()
{
try
{
BufferedReader in=new BufferedReader(new InputStreamReader(S.getInputStream()));
kN=Integer.parseInt(in.readLine());
key=new String[kN];
//CIF.setInfo(""+kN);
for(int i=0;i<kN;i++)
{
key[i]=in.readLine();
//CIF.setInfo(key[i]);
}
CIF.setKLen(kN);
//CIF.setInfo(""+kN);
CIF.setKeys(key);
//CIF.setInfo(key.toString());
while(true)
{
//PrintWriter out=new PrintWriter(new BufferedWriter(new
OutputStreamWriter(os.getOutputStream())),true);
CIF.dataFS(in.readLine());
}
}
catch(Exception e2)
{
CIF.setErr("Read:"+e2.getMessage());
}
}
}

35