You are on page 1of 46

ACKNOWLEDGEMENT

Performing this project has been an invigorating and tremendous experience for me. One
of the great joys of working on the project has been the very tangible support. It was a
mammoth task for me, and I was indeed lucky enough to find great inspiration, support
and help from many.

To start with I would like to thank, the God Almighty for showering his blessings on
me from the very beginning till the successful completion of my project. I would like
to take this chance to thank the chairman of FISAT, Mr. P.V Mathew and the
principal Mr. K V Sundaresan and for providing infrastructure and environment to
nurture students' ideas and creativity. I would also like to thank from the bottom of
my heart the Head of the department of Computer Science Mr. J.C.Prasad for
encouraging and for being a source of inspiration to me.

I would like to thank and appreciate with deep gratitude my project guide Mr. Mahesh
C, who was always there to rely upon through and guided me through the proper channel
.I am extremely thankful to the project in charge Ms. Jyotish K John and Ms. Anjali R
Nair for their valuable help and support during our course of the project. I would also
like to thank the faculty members of Computer Science department and the institution
without which this project would have been a distant reality.

Last but not least I wish to avail myself this opportunity, to express a sense of gratitude
and love to my friends and my beloved parents for their support, strength and help.
CONTENTS PAGE NO:

1.0 PREAMBLE
1.1 Introduction to the project 1

1.2 Objectives of the project 2

1.3 Scope of the project 3

2.0 SYSTEM STUDY

2.1 Proof of concept 4

2.1.1 Need of proof of concept 4

2.1.2 Proof of concept design 4

2.1.3 Proof of concept results 4

2.2 System specification 5

2.2.1 Software specifications for development,implementation 5

2.2.2 Hardware specification for development ,implementation 6

3.0 SYSTEM DESIGN & MODELLING

3.1 Design Methodologies 9

3.2 System Architecture 14

3.3 Control Flow Diagram 15

3.4 Data Flow Diagram 16

3.5 Application Explanation Diagram 17

3.6 Flow Chart 18

3.7 Database Design 20

3.8 Technology Expalnation 20

4.0 TESTING
4.1 Introduction 23

4.2 Test Cases 24

5.0 IMPLEMENTATION

5.1 Introduction 27

5.2 Installation Procedure 27

5.3 Implementation plan 27

6.0 CONCLUSION
6.1Advantages & Disadvantages of the system 28

6.2 Future Scope 28

7.0 APPENDIX

7.1 Sample code I

7.2 Technology explanation VI

7.3 Glossary VII

REFERENCES

1.0 PREAMBLE

1.1 Introduction to the project


How to make life simple? Man has been asking this question to himself for centuries. In
fact, this would have been the first question that he had tried to answer after satisfying
his basic needs of food, shelter and clothing. And this quest led him to many inventions
and discoveries. All that we see today is the result of man’s strong desire to make his
life simple and comfortable. This includes the invention of luxury cars from the early
bullock carts, computers from abacus, etc.

In this era of collaborative learning, digital environments and increasing need for
going green, considering all theses factors, an online document sharing system is
developed that can be implemented as an assignment management environment. It
enables students to submit their assignments and track the status of the same. Teachers
can view the assignments with any device that can connect to internet and review the
assignments.

The system essentially converts 'pdf' documents submitted by students into 'html
files' without losing its format or layout and presents it to teacher. This facility enables
teachers to view assignments without any hassle and they can use inbuilt browser tools
along with the tools provided for better experience. Hope this project will be counted as
a small step in standardizing the way we use documents.

I have used PHP, and a C++ tool pdftohtml as backend, JavaScript, Ajax and
HTML as front end.

1.2 Objectives of the project

The basic objective of the project is to design and implement a assignment management
system that will standardize the way documents are used across a campus. The project
aims to improve the quality of assignments being submitted substantially by
incorporating various review tools. The plagiarism checker can effectively identify
those portions of work where the students have performed a mere copy-paste from the
internet. As the project is to be implemented on the internet, this will allow students and
teachers to perform their actions at their own time and comfort. The project also aims to
eliminate the time overhead caused due to the conventional procedure for submission of
assignments and syncing the time of both parties for the review. This hopes to improve
the productivity of teachers.

From the technical view point, It aims to implement the daunting task of
converting the 4 layered pdf files to html files and thereby make it accessible to any
device that can connect to internet. The user interface is intended to be as simple as
possible.

The project intends to provide valuable contributions to the cyber world of


knowledge by converting pdf files to html and thereby making them search engine
friendly. This action can in turn be useful to millions of people who can accidently
stumble upon the pages via search engine pages and use it for the society.

1.3 Scope of the project

The project which currently runs with PHP supported server and relies on minimal
requirements such as a javascript web browser from client side and hence can be
extended to campuses of various educational institutions without creating much
overhead. By extending the system and centralizing the database, information gathered
from assignments can be of great value and will lay more emphasize on collaborative
learning across various campuses. Moreover standardizing the document format and
contributing the same to the global internet community will be a task worthy of it.
HTML formats make the contents search engine friendly and hence be of great help to
learners.

While the usability of project may depend on the simplicity of user interface, the
quality of conversion defines its functionality. The project can be a great incentive to
those people who are estranged due to format wars and support issues of varios devices
and service providers.

The project can in turn substitute and eventually completely eliminate usage of
swf files as converted files and directly show them in web browser. This functionality
also will be helpful in proper semantic indexing of contents and for clients using inbuilt
browser tools. This will be extremely useful to various companies as by making the
contents html, more meaningful and relevant ads can be placed in site as the way google
adsense works. This will enhance the user experience which makes it a industrially
viable and needed project.

2.0SYSTEM STUDY

2.1 Proof Of Concept

In order to start a project, the concept has to be clear in the beginning. This requires a
proof which can be used in our project as a basic step. A clear concept requires a clear
tool.
2.1.1 Need of proof of concept

Initially, the first spark towards the project was provided by the idea of how to make the
information stored in various documents available to the world of internet. PDF is the
most complex document format and hence converting anything to pdf is easy. While
converting the pdf to html is a daunting task. But release of latest version of HTML,
HTML5 by WHATWG vested enough capability for the task.

2.1.2 Proof of concept design

In order to be versed with HTML5, a web site with HTML5 and CSS 3 compliance was
built. The website covered every major revision in HTML5.

New additions like Video-Audio capability for HTML without any third party
plugins, canvas and drawing capabilities with javascript, form enhancements. Semantic
oriented tags, Local storage etc. were used for designing the site The site effectively
used every major aspect of HTML5.

CSS3 was also used for the design aspect of site. New features such as shadow,
curvy corners etc were used.

2.1.3 Proof of concept result

A completely functional site in HTML5 and CSS3 was designed. Much stress was
given for canvas as for the vectors in pdf canvas can be much useful. Also local storage
was yet another feature in HTML5 which was given priority while designing the site as
those two features would be of great use in the project.

2.2 System Specification

The project includes the use of many server-side software and minimal hardware
configurations.

2.2.1 Software specification

2.2.2.1.1 GNU/LINUX
The operating system used is GNU/LINUX. It is an open source software which
provides a strong platform for grooming programmers. Released under the GNU
General Public License, it has been written in the version of the C programming
language supported by GCC (GNU C Compiler).

2.2.1.2 PHP

The project is done mainly in PHP. PHP stands for the PHP hypertext processor. PHP
is a general-purpose scripting language originally designed for web development to
produce dynamic web pages. For this purpose, PHP code is embedded into the HTML
source document and interpreted by a web server with a PHP processor module, which
generates the web page document. PHP version 5.3 was used in the project

2.2.1.3 Javascript

Javascript is a client side scripting language. It is also known as ECMAScript, and is a


prototype-based, object-oriented scripting language that is dynamic, weakly typed and
has first-class functions. It has been extensively used in the project in designing the
review tools and enhancing the user experience

2.2.1.4 MySQL

The database for storing the details of student-teacher login entries, uploaded and
converted documents etc. is created and manipulated using MySQL server. It is an open
source version for Structured Query Language (SQL). A database is created for
maintaining information about teachers and students and their assignment submission
and tracking.

2.2.1.5 HTML5

The latest revision of HTML the hitherto standard used for rendering any web page, has
been used for the client side rendering of project. Its core aims have been to
improve the language with support for the latest multimedia while keeping
it easily readable by humans and consistently understood by computers and
devices. It has been devised by WHATWG

2.2.1.6 CSS

Cascading Style Sheets (CSS) is a style sheet language used to describe the presentation
semantics (the look and formatting) of a document written in a markup language. Its
most common application is to style web pages written in HTML and XHTML, but the
language can also be applied to any kind of XML document, including plain XML,
SVG and XUL. CSS is designed primarily to enable the separation of document content
(written in HTML or a similar markup language) from document presentation,
including elements such as the layout, colors, and fonts

2.2.1.7 AJAX

asynchronous JavaScript and XML is a group of interrelated web development methods


used on the client-side to create interactive web applications. With Ajax, web
applications can retrieve data from the server asynchronously in the background without
interfering with the display and behavior of the existing page. Data is usually retrieved
using the XMLHttpRequest object. Despite the name, the use of XML is not needed,
and the requests need not be asynchronous

2.2.1.8 pdftohtml

pdftohtml is a program that converts pdf documents into html. It generates its output in
the current working directory. It is by default installed in /usr/bin. The tool only
provides a command line interface and works in every UNIX and Linux based systems.
This is invoked in php with ‘system’ call or the backtick operator ‘`’

2.2.2 Hardware Specification

2.2.2.1 Server Machine

A minimal machine with following specifications is ample enough for the project.

• Intel core II duo processors,

• 256 MB RAM,

• 40 GB storage
The storage capacity can vary as per the usage and volume of documents handled in the
real time scenario. It is largely dependent on the policy of the webmaster or system
administrator to decide what is to be stored and what not.

2.2.2.2 Client Machine

Any device with internet connectivity and a graphical web browser can access the
services of the project.

For effective usage of review tools and other ajax related services a device which is
equipped with a javascript supported and enabled browser. A good screen resolution of
minimum of 1024x768 pixels will be required for an optimal experience. As stated
above any device with internet connectivityand a graphical browser can stll use the
services without any compromise in functionality of document viewing.

3.0 SYSTEM DESIGN AND MODELLING

pdftohtml HTML, Javascript

PHP + MySQL Database Web Browser

Apache Server OPERATING SYSTEM

Server Machine Client Device


Figure 3.0.1

The Apache server equipped with PHP and MySQL is preinstalled with pdftohtml, pdf
a conversion tool. From client machine when a student logins, he/she is presented with a
page for submitting assignments. The pdf file submitted will be passed to pdftohtml tool
for conversion. Student's name, subject and date of submission are entered to database.

Teachers when logged in are presented with submission list from database and can view
each assignment. Teachers are provided with review tools powered by Ajax. A student
can view reviews on his/her submission once the teacher finalizes the assignment
review.

Every page is generated dynamically using PHP after lookup in database for the entries.

3.1 Design Methodologies

Out of the different design methodologies listed in software engineering such as


Waterfall model, Spiral model, Prototyping model etc. the apt one for the project design
was the Waterfall model. So, Waterfall model is the design methodology that I have
opted.

3.1.0 Reasons behind the option

• Cost-effective

• Simple to implement for a small project


• Documentation after each level helps in identifying errors

3.1.1 Different phases

Basically, there are 5 different phases for this model. They are listed as:

3.1.1.0 Requirement analysis

The first step in doing our project according to Waterfall model is to identify the
requirements and specifying them. It also includes risk analysis, the documentation
known as Software Resource Specification (SRS).

This phase is mentioned as ‘analysis and planning’. Planning is a critical activity in


software development. A good plan is based on the requirements of the system and
should be done before all other phases begin. However, detailed requirements are not
necessary for planning. Planning usually overlaps with requirement analysis.

3.1.1.0.0 Software resource specification

• The system must be able to provide logins for teachers and students.

• Students must be able to submit, view and track assignments to corresponding


subjects.

• Teachers must be able to view all submissions and review them.

• Students should be able to view the review once a teacher finishes his/her
review.

• Teachers must be able to check for plagiarism in assignment contents.

• Provision for teachers to assign marks for each submission should be given.

3.1.1.1 System design

This phase of the Waterfall model explains the division of the entire project into various
modules for the ease of doing the project.
The purpose of design phase is to plan a solution of the problem specified by the SRS.
This phase is the first step in moving from problem domain to solution domain. It takes
us toward how to satisfy the needs. It is the most critical factor affecting the quality of
the software.

Our project is divided into four different modules. After each module, reviews were
held

The four different modules are:

• Conversion of pdf documents to html – A C++ software pdftohtml was used. It


was required to modify the source code and change various parameters used in
the tool to match with the requirements of project

• Database Design – The database design was another task in the project. Proper
file management of uploads and converted documents are important. Also
student-Teacher-Subject relationship also need to be maintained.

• User Interface – Dynamically generated user interface pages are to be generated


as per student and teacher login. The interface was made with HTML5 and
CSS3 and form styling elelments from Jquery.

• Review tools – Teachers are powered with review tools for quality evaluation of
assignments. Provision for Annotations, Plagiarism checker and assigning marks
were implemented. AJAX and PHP was used in making the review tools

3.1.1.2 Detailed design

This phase explains the details of various modules of the project which got divided in
the system design phase.

A high level design identifies the modules that should be built for developing the
systems and the specifications of these modules. It includes major data structures, file
formats, output formats, etc. In this phase, the internal logic of each of the modules is
specified. It deals with the issue of how the modules can be implemented in the
software. This phase is a systematic approach to create a design by application of a set
of techniques and guidelines.

3.1.1.2.1 First module

This phase had the major task of converting pdf documents to html format which forms
the heart of the project.

A software in C++ pdftohtml was used for the purpose. Modifications and tune up in
passing parameters were required for the proper working of the tool. PDF documents
contain 4 layers to represent the pictures, text, body and vectors. This 4 layer structure
was made to a 2 layer structure while converting to html documents. All the
illustrations, vectors and pictures were converted to png format image files. With OCR
all the text contents were extracted with its styling preserved though only for web-
standard fonts. Other fonts are substituted with Times New Roman”. The original
project was intended to convert only A4 portrait sheets but now successfully converts
any format of paper and orientation.

3.1.1.2.2 Second module

MySQL was used as the database for the system. Seperate tables are maintained for
teachers, students and documents. Also tables are maintained for relations between
teachers and subjects, students, classes subjects and reviews. Primary key – foreign key
relations are used between relations and referential integrity has been maintained.

Every teacher,student, and documents are identified by unique ids assigned to them.
These ids form the primary keys for corresponding primary keys for the tables and
relations. An entity relationship model of database is used throughout. Features of
MySQLi (improved) have been exploited during database design.

3.1.1.2.3 Third module


User interface has been designed for minimum delay due to graphics overhead while
without any compromise in aesthetics of look and feel. Special care has been taken to
exploit inbuilt capabilities of CSS3 and Jquery form styling. PHP was used to generate
dynamic HTML for every login. Use of Session and Cookies variables so as to enhance
the user experience were also achieved. Every user will be show details which is
applicable only to corresponding login and other unwanted or non-applicable features
wont be showing up. Also pdftohtml outputs every page as a single html file which are
combined together to a single file before being presented to user. Ajax has been
extensively used to increase user interactivity.

3.1.1.2.4 Fourth module

Review tools has been provided only to teacher logins. The feature is not available in
students login. AJAX & PHP were used for building review tools. Provision for
annotations empowers teachers to comment about every particular sentence or
paragraph to students. Also to improve the quality of assignment work done by
students, a plagiarism checker is also implemented. The plagiarism checker strips the
selected paragraph into independent sentences. Each sentence will is the enclosed
within double quotes and then

concatenated with “or” I between so that they each sentence act as a single unit wile
performing google search for plagiarism checking. JSON has been used for retrieving
information from google. The site links which has same contents as of assignment has
been shown to teachers.

Teachers can also award the marks for each submission. Once the teacher finalizes the
review of a submission, The student can view the reviews and make necessary
corrections

3.1.1.3 Coding

This is the third phase in the design methodology. The goal of coding phase is to
translate the design of the system into code in a given programming language. This
phase includes implementation of the plans of the design phase. Coding phase affects
both testing and maintenance profoundly. Well written code can reduce the testing and
maintenance effort. During coding, the focus should be on developing programs that are
easy to read and understand. Simplicity and clarity should be also maintained.

3.1.1.3.0 PHP Server Code

The program resides in server and as per the client request the pages are dynamically
created. At first the user need to login to use the services. A database lookup is done
and corresponding pages for students and teachers are loaded. The server code calls for
a system call to invoke pdftohtml whenever a student submits the pdf. The tool converts
pdf to html files and store in server. The individual pages are then combined together.
The reviews are shown in tabular form.

3.1.1.3.1 Client side Javascript code

The javascript code enhances provides interactivity, for review tools and for look and
feel of system. Jquery libraries are used for form styling. Also AJAX has been used
which invokes javascript asynchronously. The DOM model is used to show the popup
boxes for review tools – annotations and plagiarism checker.

3.1.1.3.2 Cascading style sheets

CSS is used for enhancing the look and feel of the site. The usage of CSS has greatly
helped in seperating the content from the look and feel part of code. Also applying the
same design and thereby preserving the look of system for every dynamic page
generated was also achieved through CSS.

3.1.1.4 Testing

The basic function of testing is to detect the defects in the software. The goal of testing
is to uncover requirement, design and coding errors in the program.

The first module was tested with various pdf documents passed for conversion. The
initial version of pdftohtml gave unsatisfactory results when it came to documents with
diagrams and pictures. Later a revised a version was obtained.

The second module was tested with various table views. The aim was to minimize data
redundancy through normalization. The third module was tested for various error and
exception handling procedures in the user interface. The Fourth module was tested by
using various converted documents for plagiarism checker.

3.1.1.5 Implementation

It is designed, coded and wired to be implemented in every campus that can provide
with minimal server requirements as specified in hardware requirements and internet
connectivity.

3.2 System Architecture

The Whole project is considered to be the system here. It needs to take care of both
client and server side. At server side it needs to take core of the database and pdf
conversion while at client side the data and interface rendering is to be looked after.

3.2.1 Software architecture

Query / pdf Processed Query storage path

Client MySQL Apache


Application PHP Database Server
Figure 3.2.1
Single All Converted converted folder path
HTML pages
page pdf
HTML
Query Database
Login Check
Submission Lookup
Teacher Login pdftohtml

3.3 Control Flow Diagram


Main Index File Retrieval
Page From server

Log Out of
system
Figure 3.3

3.4 Data Flow Diagram

HTML Data Data after Display


Get SQL Data Databas DB lookup
Get Respons
Query e Entry e
Login
Detail
s
Updat
Get e
Submitted Assign- Datab
Display
File ment SQL Data ase
Converte
File
d File
Server SQL
Submitted
Storag Data
File
e
HTML
PDF
HTML
Conver
sion

Figure 3.4
3.5 Application Explanation Diagram

Database
Assignment Entry &
Submission Document
Conversion

Student
Track View
Profile
Submissions Reviews
Page

Main
Index Login
Page

Teacher View Review


Profile Submission Tools
Page List &
Documents

Finalize
Figure 3.5

Start

Login
3.6 Flow Chart

If
F
Login
is
correct
Stop
T

Display
Profile Page

If F If F
Login Login
is is
Stude Teacher
nt
T
T

If If If
Click If Clicked Clicke Clicke
ed View d d
Add Submissio Logou Subjec
n t t
T
T T T

Logout of
A Display system Display
Submission Submission
List List
A

Figure 3.6.1
Get Document
Details

If F
Document
size > 0
Stop

Call pdftohtml
Display Error

Save files in folder =


docid

Get individual
converted files Merge into a single file

If more html
Display Success T files exist in
folder
Figure 3.6.2

3.7 Database Design

Figure 3.7

3.8Technology Explanation

For the successful completion of the project various technologies were used such as
PHP, AJAX and pdftohtml etc.

3.8.0 PHP

PHP is a general-purpose scripting language originally designed for web development


to produce dynamic web pages. For this purpose, PHP code is embedded into the
HTML source document and interpreted by a web server with a PHP processor module,
which generates the web page document. It also has evolved to include a command-line
interface capability and can be used in standalone graphical applications. PHP can be
deployed on most web servers and as a standalone interpreter, on almost every
operating system and platform free of charge
3.8.1 AJAX

Ajax shorthand for asynchronous JavaScript and XML is a group of interrelated web
development methods used on the client-side to create interactive web applications.
With Ajax, web applications can retrieve data from the server asynchronously in the
background without interfering with the display and behavior of the existing page.

3.8.2 MySQL

The database for storing the details of student-teacher login entries, uploaded and
converted documents etc. is created and manipulated using MySQL server. It is an open
source version for Structured Query Language (SQL). A database is created for
maintaining information about teachers and students and their assignment submission
and tracking.

3.8.3 HTML5

The latest revision of HTML the hitherto standard used for rendering any web page, has
been used for the client side rendering of project. Its core aims have been to improve the
language with support for the latest multimedia while keeping it easily readable by
humans and consistently understood by computers and devices. It has been devised by
WHATWG

3.8.4 CSS

Cascading Style Sheets (CSS) is a style sheet language used to describe the presentation
semantics (the look and formatting) of a document written in a markup language. Its
most common application is to style web pages written in HTML and XHTML, but the
language can also be applied to any kind of XML document, including plain XML,
SVG and XUL. CSS is designed primarily to enable the separation of document content
(written in HTML or a similar markup language) from document presentation,
including elements such as the layout, colors, and fonts

3.8.5 Javascript
Javascript is a client side scripting language. It is also known as ECMAScript, and is a
prototype-based, object-oriented scripting language that is dynamic, weakly typed and
has first-class functions. It has been extensively used in the project in designing the
review tools and enhancing the user experience.

JQuery library for javascript was used throughout the system. Both Jquery and Jquery-
UI are javascript frameworks.

3.8.6 pdftohtml

pdftohtml is a program that converts pdf documents into html. It generates its output in
the current working directory. It is by default installed in /usr/bin. The tool only
provides a command line interface and works in every UNIX and Linux based systems.
This is invoked in php with ‘system’ call or the backtick operator ‘`’. The pdftohtml can
be passed along with various parameters. Some of options available are listed below.

Options
A summary of options are included below.
-f <int> : first page to print

-l <int> : last page to print

-q : dont print any messages or errors

-v : print copyright and version info

-p : exchange .pdf links with .html

-c : generate complex output

-i : ignore images

-noframes : generate no frames. Not supported in complex output mode.

-stdout : use standard output

-zoom <fp> : zoom the pdf document (default 1.5)

-opw <string> : owner password (for encrypted files)

-upw <string> : user password (for encrypted files)


Of these options –c is used for converting documents with graphics content.

4.0 TESTING

4.1 Introduction

In a software development process, errors can be introduced at any stage during


development. Though errors are detected after each phase by techniques like
inspections, some errors remain undetected. Ultimately these errors will be reflected in
the code. Testing is the activity where the remaining errors from all phases-requirement,
design and coding are detected. Testing is done to ensure quality.

Different terms associated with testing are:

Error

It refers to the discrepancy between a computed, observed or measured value


and the true, specified or theoretically correct value.

Fault

It is a condition that causes a system to fail in performing its required functions.

Failure

It is the inability of a system or a component to perform a required function


according to its specifications.

Testing a large system is a complex activity and like any complex activity it has to
be broken into smaller activities. Due to this for a project, incremental testing is
generally performed, in which components and subsystems of the system are tested
separately before integrating them to form the system for system testing. This form of
testing, though necessary to ensure quality for a large system, introduces new issues of
how to select components for testing and how to combine them to form subsystems and
systems. In other words, integration of the various components of the system is an
important issue that the testing phase has to deal with. For this reason this phase is also
called as “Integration and testing”.

The project Document Sharing System has also undergone a series of tests during
each phase as well as after the coding. Different tests are applied to the Server side and
client side. The tests applied can be categorized in to four. They are:

• Unit testing
• Integration Testing
• System Testing
• Implementation

4.2 Test Cases

Having test cases that are good at revealing the presence of faults is central to
successful testing. The reason for this is that if there is a fault in a program, the program
can still provide expected behavior for many inputs. Only for the set of inputs that
exercise the fault in a program will the output of the program deviate from expected
behavior. Hence, it is fair to say that testing is as good as its test cases.

Ideally, I would like to determine a set of test cases such that successful execution
of all of them implies that there are no errors in the program. This ideal goal cannot
usually be attained due to practical and theoretical constraints. Two fundamental goals
of a practical testing activity are:

• Maximize the number of errors


• Minimize the number test cases
There are two aspects of test case selection-

• Specifying a criterion for evaluating a set of test cases


• Generating a set of test cases that satisfy a given criterion.
There are two fundamental properties for testing criterion:

• Reliability
• Validity
A criterion is reliable if all the sets of test cases that satisfy the test condition
detect the same errors. That is, it is insignificant which of the sets satisfying the
criterion is chosen; every set will detect exactly the same errors. A criterion is valid if
for any error in the program there is some set satisfying that will reveal the error. A
fundamental testing is that if a set satisfying the criterion succeeds then the program
contains no errors. The test cases applied for the project are:

4.2.0 Unit testing

Each module is tested using test case suited for it. The four modules of this project is
tested like this.

4.2.0.0First module

The first module was tested with various pdf documents passed for conversion. The
initial version of pdftohtml gave unsatisfactory results when it came to documents with
diagrams and pictures. Later a revised a version was obtained, which initially had issues
of outputting pictures with 90 degrees rotation. This was rectified by changing the
source.

4.2.0.1Second Module

The second module was tested with various table views. The aim was to minimize data
redundancy through normalization. At first a single large table was the only table
intended for database. But as testing progressed this approach became a bottleneck,
Later various tables were created with appropriate relations between them.

4.2.0.2Third Module

For the user interface, the primary aim was to minimize overhead due to styling. An
initial elaborate interface replaced due to overheads amnd delay in loading of pages. It
was later replaced with a minimal design and using inbuilt features of CSS. Form
styling was also later added. Exception handling for user interaction, like wrong user
type selection, non-selection of mandatory options etc. was also tested and various
exception handling routines were implemented after testing.
4.2.0.3Fourth Module

The Fourth module was tested by using various converted documents and plagiarism
checker was realized by google search engine. Initially the faults occurred due to an
error in semantics sed in string concatenation for plagiarism checker. An extra prefix of
OR was the problem. This was rectified to get the desired results.

4.2.1Integration Testing

The first three modules were integrated with PHP coding. PHP when receives input
calls for a system call for invoking pdftohtml. After that Database was taken care of
with mysqli calls from PHP. The 3rd module was taken care of as well as PHP was
generating dynamic pages. The fourth module was integrated to others with AJAX.

4.2.2System Testing

In this project the system is formed in the integration part in the previous section itself
as it was a small project containing only one subsystem.

4.2.3Acceptance Testing

Acceptance testing includes testing from the client part according to the choice of the
client to make sure that the requirements specified by the client are completely met.

5.0Implementation

5.1Introduction
The implementation procedure is an important phase in the project. It is designed to be
implemented on various Educational institution campuses. It will be a helpful tool in
monitoring and evaluating quality of assignments across a campus and thereby adapt to
the recent trends and paradigm shifts happening in the education field. The tool can be
effectively used for reaping in the benefits of collaborative learning.

5.2 Installation Procedure

The server need to be installed with PHP5 and MySQLi and a tool pdftohtml for proper
working of the project. The database needs to be configured with correct database
entries and if any legacy data exists they need to be migrated.

5.3Implementation Plans

The project as stated already is planned to implemented in various educational


institutions that can provide the specified requirements.

By default no size limit for uploading documents have been set, but can be
configured in server configuration file. Also the client web browsers need to be
equipped with modern browsers that support HTML5 to reap full benefit the project
though it is fully functional in other devices as well, compromising in user experiences.

6.0CONCLUSION
The project implementation will enhance and improve the quality of assignments
reviews and thereby quality of work doen by stuednts incrementally. Also the project
promises to be of some value in the future by aiding collaborative learning
environments and increasing productivity.

6.1 Advantages And Disadvantages Of The System

6.1.0Advantages

• The projects helps in improving quality of assignments

• Plagiarisms and sheer copy-paste activities can be well monitored

• It ensures transparency in submission dates and reviews

• Students can track their submission without fail.

6.1.1Disadvantages

• The project now can handle only standard web fonts.

• Overlapping texts with pictures cannot be extracted.

• It does not provide semantics support for generated HTML

6.2Future Scope

The project can be worked on to overcome the disadvantage listed above. The web fonts
issue can be tackled with CSS3 fontface feature. While overlapping texts need to have a
more powerful converter. The Semantics for generated HTML is an activity of high
research though it needs to go a long way in resolving the issue.

7.0APPENDIX

7.1Sample Code
7.1.0 PHP Code in the server

/* Source code for conversion of pdf files to html */

<?php

require_once('appvars.php');
require_once('connectvars.php'); // Database connetion information
$sid = $_COOKIE['user_id'];

if (isset($_POST['submit'])) {
// Grab the score data from the POST

$ondoc = $_FILES['ondoc']['name'];
$ondoc_type = $_FILES['ondoc']['type'];
$ondoc_size = $_FILES['ondoc']['size'];
$subname = $_POST['subj'];
if(!isset($_POST['subj']))
{ header('Location: ../index.php?suc=Select Something');
}
elseif (!empty($ondoc)) {
echo $ondoc_size . ' ' . $ondoc_type;
if ($ondoc_size > 0 && $ondoc_type == "application/pdf") {

echo 'entered size >0';


if ($_FILES['ondoc']['error'] == 0) {

echo 'entered error';


// Move the file to the target upload folder
$dbc = mysqli_connect(DB_HOST, DB_USER, DB_PASSWORD,
DB_NAME);
$target = GW_UPLOADPATH . $ondoc;
echo $target;
$idquery = "SELECT max(docid) FROM document";
$iddata = mysqli_query($dbc, $idquery);
$docidarr = mysqli_fetch_array($iddata);

$docid = $docidarr['max(docid)'];
echo '<br>'.$docid['max(docid)'];

$docid = $docid+1;
$tidquery = "SELECT tid FROM subject where subject='$subname' and batch in
(select batch from Student where sid=$sid)";
$tiddata = mysqli_query($dbc, $tidquery);
$tidrow = mysqli_fetch_array($tiddata);
$tid =$tidrow['tid'];

mkdir(GW_CONVFOLDER . strval($docid), 0777);


$conv = GW_CONVFOLDER . strval($docid) . '/' . strval($docid) ;
echo $conv;

if (move_uploaded_file($_FILES['ondoc']['tmp_name'], $target)) {

// Write the data to the database

$query = "INSERT INTO document VALUES ($docid, $tid ,$sid ,


'$ondoc',CURDATE() ,-30,'yet to be reviewed',0)";
echo $query;
mysqli_query($dbc, $query);
// Confirm success with the user
echo '<p>Conversion Completed</p>';
// Clear the score data to clear the form

$name = "";
$class = "";
$ondoc = "";
$output = `/usr/bin/pdftohtml $target -c $conv`;
echo "<pre>$output</pre>";
mysqli_close($dbc);
require_once('viewer.php');
header('Location: ../index.php?suc="Your file has been successfully uploaded"');
}

else {
echo '<p class="error">Sorry, there was a problem uploading your file.</p>';
header('Location: ../index.php?suc=Sorry wrong Filetype or Corrupted file');
}
}
}
else {
echo '<p class="error">Error , no contents or Wrong file format</p>';

header('Location: ../index.php?suc=Sorry wrong Filetype or Corrupted file');


}
// Try to delete the temporary file
@unlink($_FILES['screenshot']['tmp_name']);
}
else {
echo '<p class="error">Pleae</p>';
}
}
?>

7.1.1 Javascript code at the client side


/* Source code for review tools. */

window.onload = initPage;
var flag=0;
if (window.Event) {
document.captureEvents(Event.MOUSEUP);
}
document.onmouseup = mouseup;

// Actions to be performed when mouse button is clicked and then released.


function mouseup(e)
{
if(flag==0)
{
var selObj = window.getSelection();
flag=1;
if(selObj != "")
// display the box in 10 ms
// Doing it right away makes things not behave right .
getDetails(selObj);
}
}

function initPage() {
// set the handler for each image
var selObj = window.getSelection();
}

// Creating a request for AJAX, actions for various browsers are considered.

function createRequest() {
try {
request = new XMLHttpRequest();
} catch (tryMS) {
try {
request = new ActiveXObject("Msxml2.XMLHTTP");
} catch (otherMS) {
try {
request = new ActiveXObject("Microsoft.XMLHTTP");
} catch (failed) {
request = null;
}
}
}
return request;
}

function butappear() {
myDiv = document.getElementById("tbox");
myDiv.style.visibility = "visible";
}

// Function to show the hidden box and contents


function getDetails(itemName) {
myDiv = document.getElementById("detailsPane");
myDiv.style.visibility = "visible";
detailDiv = document.getElementById("description");
detailDiv.innerHTML = itemName;
}

function displayDetails() {
if (request.readyState == 4) {
if (request.status == 200) {

detailDiv = document.getElementById("description");
detailDiv.innerHTML = request.responseText;
}
}
}

// Function to close the popup the box.


function iClose()
{
myDiv = document.getElementById("detailsPane");
myDiv.style.visibility = "hidden";
flag=0;
}
// Function to handle the Plagiarism checker
function comment_get(oForm)
{
var revData = oForm.comments.value;
detailDiv = document.getElementById("description");
itemName = detailDiv.innerHTML ;
request = createRequest();
if (request == null) {
alert("Unable to create request");
return;
}
var url= "http://localhost/xampp/logn2/mismatch/onTheDocs/review.php?excerpt=" +
escape(itemName) +"&rev="+escape(revData)+"&fin=0";
request.open("GET", url, true);
request.onreadystatechange = displayDetails;
request.send(null);

function plag_get(oForm)
{
detailDiv = document.getElementById("description");
itemName = detailDiv.innerHTML ;
request = createRequest();
if (request == null) {
alert("Unable to create request");
return;
}
var url= "http://localhost/xampp/logn2/mismatch/onTheDocs/plagcheck.php?plag="
+escape(itemName);
//alert(url);
request.open("GET", url, true);
request.onreadystatechange = displayDetails;
request.send(null);

function fnal()
{
myDiv2 = document.getElementById("fnal");
myDiv2.style.visibility = "visible";
}

7.1.2 CSS Code

<!--- CSS code for styling the whole system -->


body
{
background:url(noise.png);
margin-left:50px;
}

#detailsPane {
background: #fff url('../images/bgDetailPane.png') 115px 91px no-repeat;
<!--Relative path -->
border: 15px solid #003;
left: 338px;
padding: 10px 15px 0 15px;
position:fixed;
top: 50px;
text-align: left;
width: 617px;
visibility:hidden;
z-index:10000;
max-height:400px;
}

#fnal {
background: #fff url('../images/bgDetailPane.png') 115px 91px no-repeat;
border: 15px solid #003;
left: 338px;
padding: 10px 15px 0 15px;
position:fixed;
top: 50px;
text-align: left;
width: 617px;
visibility:hidden;
z-index:10000;
max-height:400px;
}

/* CSS ids for child description with parent fnal only */

#fnal #description
{overflow:none;}

#itemDetail { left: 0; position: absolute; top: 0px; }


#description { padding: 10px;
margin:20px 0px;
border: 5px solid #003;
max-height:200px;
overflow:scroll; }
#footer { width: 100%; text-align:right; float:right; font-size: 12px; position:fixed;
}
/* CSS3 Specifications used below */
#footer a { padding: 2px 10px; margin: 0px 60px; color: #999; background: #fff; text-
decoration: none; border-radius: 10px; -webkit-border-radius: 10px; -moz-border-
radius: 10px; -khtml-border-radius: 10px; }
#footer a:hover,#footer a:focus { color: #fff; background: #333; background:
rgba(0,0,0,.3); }

#tbox
{ padding: 2px 10px; margin: 10px 60px; color: #999; background: #fff; text-
decoration: none; border-radius: 10px; -webkit-border-radius: 10px; -moz-border-
radius: 10px; -khtml-border-radius: 10px;
float:right;
text-align:right;
position:fixed;
bottom:20px;
right:2px;
visibility:hidden;
}

7.1.3 HTML

<!DOCTYPE html>
<html lang="en">
<!-- HTML5 Standards -->
</head>
<body>
<div id="sizer">
Contents are generated dynamically by PHP
</div>
</body>
</html>

7.2Technology Explanation

For the successful completion of the project various technologies were used such as
PHP, AJAX and pdftohtml etc.

7.2.0 PHP

PHP is a general-purpose scripting language originally designed for web development


to produce dynamic web pages. For this purpose, PHP code is embedded into the
HTML source document and interpreted by a web server with a PHP processor module,
which generates the web page document. It also has evolved to include a command-line
interface capability and can be used in standalone graphical applications. PHP can be
deployed on most web servers and as a standalone interpreter, on almost every
operating system and platform free of charge

7.2.1 AJAX

Ajax shorthand for asynchronous JavaScript and XML is a group of interrelated web
development methods used on the client-side to create interactive web applications.
With Ajax, web applications can retrieve data from the server asynchronously in the
background without interfering with the display and behavior of the existing page.

7.2.2 MySQL

The database for storing the details of student-teacher login entries, uploaded and
converted documents etc. is created and manipulated using MySQL server. It is an open
source version for Structured Query Language (SQL). A database is created for
maintaining information about teachers and students and their assignment submission
and tracking.

7.2.3 HTML5

The latest revision of HTML the hitherto standard used for rendering any web page, has
been used for the client side rendering of project. Its core aims have been to improve the
language with support for the latest multimedia while keeping it easily readable by
humans and consistently understood by computers and devices. It has been devised by
WHATWG

7.2.4 CSS

Cascading Style Sheets (CSS) is a style sheet language used to describe the presentation
semantics (the look and formatting) of a document written in a markup language. Its
most common application is to style web pages written in HTML and XHTML, but the
language can also be applied to any kind of XML document, including plain XML,
SVG and XUL. CSS is designed primarily to enable the separation of document content
(written in HTML or a similar markup language) from document presentation,
including elements such as the layout, colors, and fonts
7.2.5 Javascript

Javascript is a client side scripting language. It is also known as ECMAScript, and is a


prototype-based, object-oriented scripting language that is dynamic, weakly typed and
has first-class functions. It has been extensively used in the project in designing the
review tools and enhancing the user experience.

JQuery library for javascript was used throughout the system. Both Jquery and Jquery-
UI are……………

7.2.6 pdftohtml

pdftohtml is a program that converts pdf documents into html. It generates its output in
the current working directory. It is by default installed in /usr/bin. The tool only
provides a command line interface and works in every UNIX and Linux based systems.
This is invoked in php with ‘system’ call or the backtick operator ‘`’. The pdftohtml can
be passed along with various parameters. Some of options available are listed below.

Options
A summary of options are included below.
-f <int> : first page to print

-l <int> : last page to print

-q : dont print any messages or errors

-v : print copyright and version info

-p : exchange .pdf links with .html

-c : generate complex output

-i : ignore images

-noframes : generate no frames. Not supported in complex output mode.

-stdout : use standard output

-zoom <fp> : zoom the pdf document (default 1.5)

-opw <string> : owner password (for encrypted files)

-upw <string> : user password (for encrypted files)


Of these options –c is used for converting documents with graphics content.

7.3GLOSSARY

Adsense

Trademarked by google is a premium product that shows advertisements in various web


pages related to contents in pages.

AJAX

Asynchronous javascript and XML, client side web developments method

HTML

hypertext markup language: a set of tags and rules (conforming to SGML) for using
them in developing hypertext documents

JQuery

Jquery is a standard framework for javascript and i.e. extensively used for user interface
development, form enhancement and validations and animations.

pdf

Portable Document Format (PDF) is a file format created by Adobe Systems in 1993 for
document exchange. PDF is used for representing two-dimensional documents in a
manner independent of the application software, hardware, and operating system.

pdftohtml

pdftohtml is a server side program that converts pdf documents into html. It works in
every UNIX and Linux Operating Systems

png

It is a standard image format mainly intended to be used in web services and stands to
replace gif. It expands as the recursive acronym PNG is Not Gif.

Vector
Type of images that are not pixelated but depends on strict mathematical formulas and
relations. This makes the images scalable to any size without any problem with
resolutions

REFERENCES

HTML5: Up and Running by Mark Pilgrim, First Edition (August 2010) published by
O’REILLY Media and Google press

Head First PHP & MySQL by Lynn Beighley and Michael Morrison, First Edition
(Dec 29, 2008) published by O’REILLY Media

Head First Ajax by Rebecca M. Riordan, First Edition (Sep 2, 2008) Published by
O’REILLY Media

http://www.scribd,com

http://www.coding.scribd.com

http://www.oreilly.com

http://www.adobe.com
SCREENSHOT

I Initial Page

II Login Screen

III Student Profile Page


IV Student Submission List

V Teacher Profile page

VI Assignment List for teacher


VII Converted Document in HTML Format

VIII review Tools

IX Reviews page

You might also like