You are on page 1of 51

National University of Sciences & Technology (NUST)

School of Electrical Engineering and Computer Science (SEECS)


Department of Computing

Final Year Project

Software Requirements Specification

For

Collaborative Hadoop

Version 1.1

Arooj Sajid
Usama Bin Tariq

Advisor: Fahad Ahmed Satti


Co-advisor: Dr. Shahzad Saleem

9th February 2017


Table of Contents
1
Table of Contents.............................................................................................................................2
Table of Figures................................................................................................................................5
Revision History................................................................................................................................6
1. Introduction..............................................................................................................................7
1.1 Purpose..................................................................................................................................7
1.2 Document Conventions.........................................................................................................8
1.2.1 Formatting Conventions.................................................................................................8
1.2.2 Naming Conventions......................................................................................................8
1.3 Intended Audience and Reading Suggestions........................................................................8
1.4 Product Scope........................................................................................................................8
1.5 References.............................................................................................................................9
2. Overall Description.................................................................................................................10
2.1 Product Perspective...................................................................................................................10
2.2 Product Functions......................................................................................................................11
2.3 User Classes and Characteristics................................................................................................12
2.3.1 Organizations......................................................................................................................12
2.4 Operating Environment.............................................................................................................12
2.5 Design and Implementation Constraints...................................................................................13
2.6 User Documentation..................................................................................................................13
2.7 Assumptions and Dependencies................................................................................................13
3. External Interface Requirements............................................................................................14
3.1 User Interfaces...........................................................................................................................14
3.2 Hardware Interfaces..................................................................................................................17
3.3 Software Interfaces....................................................................................................................17
3.3.1 Database.............................................................................................................................17
3.3.2 Operating Systems..............................................................................................................18
3.3.3 Tools...................................................................................................................................18
3.4 Communications Interface.........................................................................................................18
3.4.1 Web Browser......................................................................................................................18
3.4.2 Internet Connection............................................................................................................18
3.4.3 Electronic Forms.................................................................................................................19
3.4.4 HTTP....................................................................................................................................19
4. System Features......................................................................................................................19
4.1 User Login..................................................................................................................................19
4.1.1 Description and Priority......................................................................................................19
4.1.2 Stimulus/Response Sequences...........................................................................................20

2
4.1.3 Functional Requirements....................................................................................................20
4.2 User Sign Up..............................................................................................................................21
4.2.1 Description and Priority......................................................................................................21
4.2.2 Stimulus/Response Sequences...........................................................................................22
4.2.3 Functional Requirements....................................................................................................23
4.3 Submit Map Reduce Job............................................................................................................24
4.3.1 Description and Priority......................................................................................................24
4.3.2 Stimulus/Response Sequences...........................................................................................25
4.3.3 Functional Requirements....................................................................................................25
4.4 View Results...............................................................................................................................26
4.4.1 Description and Priority......................................................................................................26
4.4.2 Stimulus/Response Sequences...........................................................................................26
4.4.3 Functional Requirements....................................................................................................27
4.5 Edit Details.................................................................................................................................27
4.5.1 Description and Priority......................................................................................................27
4.5.2 Stimulus/Response Sequences...........................................................................................28
4.5.3 Functional Requirements....................................................................................................29
4.6 User Logout...............................................................................................................................29
4.6.1 Description and Priority......................................................................................................29
4.6.2 Stimulus/Response Sequences...........................................................................................30
4.6.3 Functional Requirements....................................................................................................30
5. Other Nonfunctional Requirements........................................................................................31
5.1 Performance Requirements.......................................................................................................31
5.2 Safety Requirements.................................................................................................................32
5.3 Security Requirements...............................................................................................................33
5.4 Software Quality Attributes.......................................................................................................34
5.4.1 Adaptability........................................................................................................................34
5.4.2 Availability..........................................................................................................................34
5.4.3 Correctness.........................................................................................................................34
5.4.4 Flexibility.............................................................................................................................34
5.4.5 Interoperability...................................................................................................................34
5.4.6 Maintainability....................................................................................................................34
5.4.7 Portability...........................................................................................................................34
5.4.8 Reliability............................................................................................................................34
5.4.9 Reusability..........................................................................................................................35
5.4.10 Robustness........................................................................................................................35

3
5.4.11 Testability.........................................................................................................................35
5.4.12 Usability............................................................................................................................35
5.5 Business Rules............................................................................................................................35
6. Other Requirements...............................................................................................................36
6.1 Database Requirements............................................................................................................36
6.2 Legal Requirements...................................................................................................................37
6.3 Reuse Objectives........................................................................................................................37
Appendix A: Glossary......................................................................................................................37
Appendix B: Analysis Models and Other References.......................................................................40
Use Case Diagram............................................................................................................................40
Component Diagram.......................................................................................................................42
Package Diagram.............................................................................................................................42
Object Diagram................................................................................................................................43
Sequence Diagram...........................................................................................................................43
Communication Diagram.................................................................................................................44
Login............................................................................................................................................44
Editing User Details......................................................................................................................44
Run MR Job..................................................................................................................................45
View Result..................................................................................................................................45
Deployment Diagram.......................................................................................................................46
Activity Diagrams.............................................................................................................................46
User Login....................................................................................................................................46
Editing User Details......................................................................................................................47
Run MR Job..................................................................................................................................48
View Results................................................................................................................................49
Architecture Diagram......................................................................................................................50

Table of Figures
Figure 1 - Block Diagram------------------------------------------------------------------------------------------------- 10
Figure 2 - System Environment Diagram-----------------------------------------------------------------------------11
Figure 3-Class Diagram--------------------------------------------------------------------------------------------------- 12

4
Figure 4-Upload MR Page----------------------------------------------------------------------------------------------- 15
Figure 5-MR Submission Page------------------------------------------------------------------------------------------ 16
Figure 6-Success Message----------------------------------------------------------------------------------------------- 16
Figure 7-Execute MR Page----------------------------------------------------------------------------------------------- 17
Figure 8-Success Message----------------------------------------------------------------------------------------------- 17
Figure 9-Use Case Diagram---------------------------------------------------------------------------------------------- 42
Figure 10-Component Diagram---------------------------------------------------------------------------------------- 43
Figure 11-Package Diagram--------------------------------------------------------------------------------------------- 43
Figure 12-Object Diagram----------------------------------------------------------------------------------------------- 44
Figure 13-Sequence Diagram------------------------------------------------------------------------------------------- 45
Figure 14-Communication Diagram 1---------------------------------------------------------------------------------45
Figure 15-Communication Diagram 1---------------------------------------------------------------------------------46
Figure 16-Communication Diagram 2---------------------------------------------------------------------------------46
Figure 17-Communication Diagram 3---------------------------------------------------------------------------------47
Figure 18-Deployment Diagram---------------------------------------------------------------------------------------- 47
Figure 19-Activity Diagram 1-------------------------------------------------------------------------------------------- 48
Figure 20-Activity Diagram 2-------------------------------------------------------------------------------------------- 49
Figure 21-Activity Diagram 3-------------------------------------------------------------------------------------------- 50
Figure 22-Activity Diagram 4-------------------------------------------------------------------------------------------- 51
Figure 23-Architecture Diagram--------------------------------------------------------------------------------------- 52

Revision History

Name Date Reason For Changes Version

5
Arooj Sajid and

Usama Bin Tariq 28th January, 2017 Initial version Version 1.0

Functional/Nonfunctional
Arooj Sajid and
Requirements and others
Usama Bin Tariq 9th February, 2017 errors Version 1.1

1. Introduction
Hadoop is a highly scalable data storage and analytics platform for processing huge volumes of
structured and unstructured data. Petabytes of data is spread across hundreds or thousands of
physical storage nodes. Organizations can have a significant amount of Open Data, these data sets

6
are available for public usage and/or analysis. A researcher, having a typical Hadoop cluster,
planning to run a Map Reduce job on the data sets would be faced with some limitations like the
organization’s clusters could be inaccessible to the external party because of certain constraints such
as unreachability and/or limited resources incapable of processing larger data sets. Assuming that
sufficient resources are available, the datasets would have to be stored on the relevant cluster to
run the MR job, this could be done by copying the data sets either physically or digitally but the time
taken to retrieve the data may not be feasible.

The problem is the inability to analyze open data-sets due to resource constraints and alternatives to
get better resources are more time consuming and/or less cost effective. The main goal of
Collaborative Hadoop is to enable users to analyze open data sets within the providing entity’s
Hadoop cluster and to avoid copying the data-sets to user’s Hadoop cluster. Collaborative Hadoop
lets the researcher run the Map Reduce job on the idle slave nodes of the clusters containing the
data sets instead of copying the data sets on its own cluster to save time and for efficient resource
utilization.

1.1 Purpose
This Software Requirements Specification (SRS) document (version 1.0) describes in detail the
requirements and specifications of the software system being developed i.e. Collaborative Hadoop,
which will be a web application to allow researchers analyze data at its stored location and only
retrieve the results.

The domain of Collaborative Hadoop includes Education Commissions, Academics, Governments,


Research Institutes and Medical Industries in the country. These organizations tend to have multiple
branches at various locations with data split amongst these locations. These multiple locations could
have respective clusters deployed to meet their data analytics and processing needs. Collaborative-
Hadoop would ease the process of using a cluster at another branch to retrieve results based on
data at that branch. This document is therefore meant to be written for the software developers of
the proposed system, the organizations who will be using the application and the maintenance staff
who might modify the Collaborative Hadoop application in the future based on the changing
requirements. It explains in detail the requirements of the system software, including the system
requirements, user requirements and hardware and software requirements.

It also highlights the design plan (i.e. the database system for keeping track of the clusters set up at
respective organizations and record of where the respective data sets are stored and the interface
front-end for user interaction with the web application), the basic tasks, the probable scenarios and
the constraints and limitations of the application.

The purpose of this document is to present a detailed description of Collaborative Hadoop. It will
explain the features and benefits of the system, the structure of the system, what the system will do,
the constraints under which it must operate and how the system will react to external stimuli. This
document is intended for both the stakeholders and the developers of the system.

7
1.2 Document Conventions
1.2.1 Formatting Conventions
The font used throughout this Software Requirements Specification (SRS) document for headings is
Times New Roman and font for normal text is Calibri. The font size for the normal text is 12, for the
section headings it is 18, for the second level headings it is 14 and for the third level headings it is 12.
All the headings, irrespective of their level, are written in bold. The line and paragraph spacing for
normal text is 1.15 and the line and paragraph spacing for the text in bullets is 1.5. Important
information has been written in bold. The whole text has been justified. In general, all the IEEE
requirements for formatting have been followed.

1.2.2 Naming Conventions


The headings and subheadings have been named as per the subject matter of their content and all
their titles for the headings have been adopted from the IEEE SRS template.

1.3 Intended Audience and Reading Suggestions


This Software Requirements Specification (SRS) document is intended for the organizations like the
Education Commissions, Academics, Governments, Research Institutes and Medical Industries who
will use Collaborative Hadoop so that they can authenticate what is written in this document.

Apart from the aforementioned audience, this document is also meant for the software developers
to learn how to develop the application and to understand the structure of the product.

Furthermore, this document is for the project managers to help them understand how to manage
the teams involved.

In addition, it is intended for the sales and marketing team to understand the functionality of the
product and develop marketing techniques accordingly.

This document is also written for the testing team to help them prepare test cases and allow them to
debug the software.

Lastly, this SRS document is written for the maintenance staff who might modify the Collaborative
Hadoop application in the future based on the changing requirements.

1.4 Product Scope


The software system being specified in this document, i.e. Collaborative Hadoop, is a web
application for the organizations who wish to analyze data without having the need to replicate the
data. The system will be designed to ease the organizations to run data analytics without going
through the trouble of copying the data their own clusters either physically or digitally. The software
system will be ease to use and it will maximize the work efficiency of the organizations in our
country.

8
Organizations aiming to use open data sets would benefit from Collaborative Hadoop as they
manage to analyze data at its stored location and only retrieve the results. Education Commissions,
Academics, Governments, Research Institutes and Medical Industries in our country could all gain to
benefit from Collaborative-Hadoop. These organizations tend to have multiple branches at various
locations with data split amongst these locations. These multiple locations could have respective
clusters deployed to meet their data analytics and processing needs. Collaborative-Hadoop would
ease the process of using a cluster at another branch to retrieve results based on data at that
branch. The goal of Collaborative Hadoop is to enable organizations to analyze open data sets within
the providing entity’s Hadoop cluster and to avoid copying the data sets to their own Hadoop
cluster.

More specifically, the software will facilitate communication between organizations, enables them
to run their respective Map Reduce jobs on the idle slave nodes of the clusters containing the data
sets without replicating the data to their own cluster and permits them to save time and utilize the
available resources efficiently. The system also contains a database for keeping track of where the
respective data sets are stored, the clusters set up at the organizations and the idle slaves nodes
available in the respective clusters.

1.5 References
This Software Requirements Specification (SRS) document has been written according to IEEE
standard. The IEEE Standard used is 830-1998 IEEE Recommended Practice for Software
Requirements Specifications. IEEE Computer Society, 1998.

2. Overall Description
.

9
2.1 Product Perspective
Collaborative Hadoop is a new and self-contained product being developed for individuals and/or
organizations to assist them run data analytics. It is solely to be designed and developed from the
user requirements and specifications.

Collaborative Hadoop will consist of four major modules which includes user login, MR Job
submission, connection to clusters and result retrieval. Users can log in to the application on their
own device by providing their credentials. On successful login, users can upload their Map Reduce
jobs onto the server. After the MR Job is submitted, the MR Job submission module will connect to
the relevant clusters and send the MR Job along with the path to the input file. Then, the clusters
will run the respective MR Job and send their results to the result retrieval module. The result
retrieval module will reduce the results obtained from the clusters to produce a final result and send
it back to the user.

The overall system modules have been shown as a block diagram given below:

Figure 1 - Block Diagram

The overall system have been shown as a system environment diagram given below:

10
Figure 2 - System Environment Diagram

2.2 Product Functions


Collaborative Hadoop provides the followings functions:

 Sign Up
 Login
 Submit Map Reduce Jobs
 Runs Map Reduce job on clusters containing required data
 Reduce the results obtained from the clusters
 Produces final result
 View Results
 Log out

The users can login to the application by providing their credentials, i.e. user ID and password. If the
user is not already registered with the application, he/she can sign up by filling in a form containing
some basic information. The user can submit a Map Reduce job along with the path to the input file
to the application and view the result of the submitted Map Reduce job. The user can also log out
from the application.

The application also provides some functions which are hidden from the user. The application
connects to the clusters with relevant data to run the Map Reduce Job of the user. The application
also reduces the results obtained from the clusters to produce a final result that is viewed by the
user.

The product functions have been shown using a class diagram given below:

11
Figure 3-Class Diagram

2.3 User Classes and Characteristics


The users of Collaborative Hadoop should have the basic understanding of what a Map Reduce job is
and how to write a MR Job. To sign up to the application, one should have a registered organization.
All the registered users can enjoy the same features of the application. There is only one type of
users for the Collaborative Hadoop application:

2.3.1 Organizations
The individuals in any organization can log in to use the application. If the organization is not already
registered with the application, the personnel can also sign up by filling in a form containing some
basic information about the organization like name, email address, city, address etc. They can also
view and update their information and credentials. They can submit their Map Reduce jobs along
with the input file name to the application. The application will then send the command to run the
MR Job, reduce the multiple results obtained from clusters and save the final result. The individuals
in any organization can view the result of the submitted Map Reduce job depending on their level of
clearance. They can also log out from the application.

2.4 Operating Environment


Collaborative Hadoop will be a web application and it will be able to operate on any web browser on
the user’s computer system or even a mobile phone. It will be essential for a user to have an active
Internet connection to use the Collaborative Hadoop application and to be able to run their Map
Reduce job.

12
2.5 Design and Implementation Constraints
The design and implementation of the Collaborative Hadoop application requires a number of
constraints that must be observed. Some of these constraints are as follows:

 The system must be developed in accordance to the policies of the organizations which will
be using the application.
 Only registered organizations are able to sign up to the Collaborative Hadoop application.
 There must be an active Internet connection to use the application.
 Hadoop clusters must be up and running before the application can access them.
 For the development of the application, the spring MVC framework must be used.
 MongoDB must be used as the database management system.
 The application must be connected to the database when the user wants to add, view or
update their data
 For the testing of the application, any web browser can be used.
 The user must be authenticated before being provided with any of the application’s
functionality.

2.6 User Documentation


The user documentation is as important as the development of the application is. User
documentation aids the users to use the system easily. To help the people use the application
efficiently, a number of user documentations will be available to the user.

The users will be provided with a user manual in the help section of the application. This will contain
information about various functionalities and how a user can implement them. For users to have a
better understanding, pictures and screenshots will also be provided along with the step wise details
regarding various components.

Along with the written user manual, video tutorials will also be available to the users in the help
section of the application. Users will also be able to contact the developing team in case of any
technical difficulties or queries. The developing team’s contact information will be available in the
“Contact Us” section of the application.

A few known user documentation standards are IEEE standard 12207-2008, MIL-STD-498 standard
and ISO standard 01.110: Technical product documentation.

13
2.7 Assumptions and Dependencies
For the development of the application, the assumptions and the dependencies the system has are
as follows:

 The user is a part of a registered organization


 The user has an active internet connection
 The user has the basic knowledge of MR Jobs
 The user knows how to write an MR job
 The user is well aware of how to use a web application
 The database is connected to let the user log in
 The application is connected to the database when the user wants to add, view or update
their data
 The server can handle any number of users concurrently
 Hadoop clusters are up and running before the application can access them
 The user is aware of the benefits of data analytics

3. External Interface Requirements

14
3.1 User Interfaces
The user interface of the application will be as easy to use as possible. The components of the
application should be self-explanatory, the aim is to help the users recognize the interface elements
rather than recall. The interfaces will be implemented according to the standards stated by the HCI.

The user interface for the software shall be compatible to any web browser such as Internet
Explorer, Mozilla, Google Chrome, Safari, and Netscape Navigator etc. by which user can access the
system.

The sample screens from the initial prototype are shown below:

Figure 4-Upload MR Page

15
Figure 5-MR Submission Page

Figure 6-Success Message

16
Figure 7-Execute MR Page

Figure 8-Success Message

17
3.2 Hardware Interfaces
The hardware interfaces required are as follows:

 A computer system or a mobile phone with a web browser


 Functioning Hadoop clusters are implemented on commodity hardware
 Communication protocols for internet connectivity
 Communication protocols for database connectivity to access and manipulate the data
stored on database

3.3 Software Interfaces


Collaborative Hadoop will be connected with the following software components:

3.3.1 Database
Collaborative Hadoop will be linked with a database which will be created using MongoDB. The
application will communicate with the database to carry out the tasks like storing, updating and
retrieving user’s data and keeping track of the information related to the clusters.

3.3.2 Operating Systems


The Hadoop clusters will be implemented using either Ubuntu Linux, CentOS or any other operating
system on commodity hardware. The client interface can run on any operating system. The
development language of the product will be an operating system independent language.

3.3.3 Tools
The tools to be used for the development of the application are Spring MVC, JSP, HTML, CSS, and
Apache Hadoop.

When a user opens up the application on their web browser, the application checks for the internet
connection and establishes a connection with the database. In case of inactive internet connection,
the application doesn’t load and prompts an error message. When the connection is established, the
log in window of the application launches. The user then enters their login details i.e. username and
password and clicks on the log in button. When the button is pressed, application contacts the
database to verify the user which in return tells where the user is authenticated or not. If the login
details are incorrect, an error message is prompted. Similarly, when a user registers to the
application, the entered details are taken from the user interface to the database where they are
stored as a record using the specified query.

In case of incorrect login details or other kind of violation, an error message is prompted which helps
the user to understand what was expected of them. The creation of these error messages is based
on the data stored in the database. The client can access the data stored if the database connection
exists and the user has been authenticated.

18
In case of sending a Map Reduce job, the user specifies the path to input file as well. This input file is
looked in multiple clusters on which the MR job later runs.

3.4 Communications Interface


For the application to work properly, a few communication interfaces will be required which are as
follows:

3.4.1 Web Browser


Collaborative Hadoop is a web application which will require a web browser to run. The application
shall be compatible to any web browser such as Internet Explorer, Mozilla, Google Chrome, Safari,
and Netscape Navigator etc.

3.4.2 Internet Connection


Active Internet connection plays a vital role in the operating the Collaborative Hadoop application.
Internet connection is a part of communication protocol which lets the users connect to the
application. If the internet connection is unavailable, the application will be unable to connect and
the user would not be allowed to use the application.

3.4.3 Electronic Forms


The electronic forms will be used to allow the user log in and sign up to the application. The log in
electronic form will contain the username and password whereas the sign up electronic form will
contain username, password, name of organization, gender, postal address, city and email address.

3.4.4 HTTP
The Collaborative Hadoop application will use the HTTP protocol for the communication over the
internet.

19
4. System Features
Our application allows the users run data analytics and provides them with a mechanism to submit
their Map Reduce job. The application also provides the user with the facilities of login, sign up, view
their data and edit data. For better understanding, we divided each of these functionalities into
separate system features, giving a detailed explanation of each.

4.1 User Login


4.1.1 Description and Priority
This feature allows the users to log in to the application and access the rest of the features that the
application provides. Since the user login is the most important feature for the user if he/she wishes
to use the other multiple functionalities of the system, the priority of this feature is high. Other set
of priorities of this feature are as follows:

4.1.1.1 Benefit
This feature is very important as it would separate unauthenticated users from the authenticated
users. This feature will keep the malicious users away from the system and it would allow the
maintenance of each user uniquely and securely. On a relative scale of 1 to 9, this feature will be
rated 9 because the priority for benefit is 9.

4.1.1.2 Penalty
This feature is extremely important in terms of penalty because if this feature doesn’t exist any
malicious user would be able to use the application which will affect the security of the system. On a
relative scale of 1 to 9, this feature will be rated 9 because the priority for penalty is 9.

4.1.1.3 Cost
This feature will contain an access to the database to validate the data provided by the user. The
cost to implement this feature is not very high so the priority for cost is somewhere between 2 and
3.

4.1.1.4 Risk
The risk that would be imposed on the system without this feature is huge because without this
feature, the system would not be able to keep the malicious users away. Also, this feature is not very
hard to implement. The technical and other risk associated with this feature is not too much. On a
relative scale of 1 to 9, the priority for risk is somewhere between 2 and 3.

4.1.2 Stimulus/Response Sequences


A user can log in to the application by clicking on the “Login” button on the user interface of the
system. To access the functionalities provided by the application, the user would have to provide
his/her credentials and then click on the “Login” button available on the first screen that opens up.
The system will then verify the credentials entered by the user with the credentials stored in the
database. The sequence of user actions and system responses that stimulate the behavior defined
for this feature is as follows:

20
S. No Stimulus Response
1 User enters the username and password The entered data is displayed on the screen.
The password is hidden using asterisks.
2 User clicks on the “Login” button The data entered is verified using the stored
data in the database. In case of correct
credentials, a new screen appears to the
user so that he/she can access other
features. In case of incorrect credentials, an
error message prompts and the user is
requested to enter the data again.

4.1.3 Functional Requirements


The functional requirements associated with this feature are as follows:

S. No. Requirement Requirement Name Description Input Results


ID
1 REQ-1 Complete Data The details entered by the An empty field
user must be complete will generate an
error message
2 REQ-2 Correct Data The data entered must be If the data is
correct. correct, a new
screen appears
to the user so
that he/she can
access other
features. If the
data is incorrect,
an error message
prompts and the
user is requested
to enter the data
again
3 REQ-3 Data Type The username and password If any other data
fields both accepts only type is entered,
strings an error message
will be generated
4 REQ-4 Security The browser should not The page must
cache the login page be treated as
sensitive data
and the browser
must be told not
to cache the
page

21
4.2 User Sign Up
4.2.1 Description and Priority
This feature allows the users to sign up to the application and access the rest of the features that the
application provides. To sign up to the application, a user must be part of a registered organization.
Since the user sign up is a very important feature for the users, because it lets the users register to
the application and enjoy the functionalities provided by the system, the priority of this feature is
high. Other set of priorities of this feature are as follows:

4.2.1.1 Benefit
This feature is very important as it would let the users register to the application and enjoy the
features of the system. It also plays a role in separating unauthenticated users from the
authenticated users. On a relative scale of 1 to 9, this feature will be rated 9 because the priority for
benefit is 9.

4.2.1.2 Penalty
This feature is extremely important in terms of penalty because if this feature doesn’t exist any user
without a registered organization would be able to register to the application which will affect the
reliability of the system. On a relative scale of 1 to 9, this feature will be rated 9 because the priority
for penalty is 9.

4.2.1.3 Cost
This feature will contain an access to the database to store the data provided by the user. It will also
cross check that the organization specified by the user is a registered organization or not. The cost to
implement this feature is not very high so the priority for cost is somewhere between 2 and 3.

4.2.1.4 Risk
The risk that would be imposed on the system without this feature is huge because without this
feature, the system would not be able to keep the users without a registered organization away.
Also, this feature is not very hard to implement. The technical and other risk associated with this
feature is not too much. On a relative scale of 1 to 9, the priority for risk is somewhere between 2
and 3.

4.2.2 Stimulus/Response Sequences


A user can sign up to the application by clicking on the “Register” button available on the first screen
that opens up. To access the functionalities provided by the application, the user would have to
provide his/her basic details including username, password, name of organization etc. and then click
on the “Submit” button on the user interface of the system. The system will then store the
credentials entered by the user in the database and verify that the name of organization provided by
the user exists. The sequence of user actions and system responses that stimulate the behavior
defined for this feature is as follows:

S. No Stimulus Response
1 User clicks on “Register” button A new screen is opened up and a form is
generated for the user to fill.

22
2 User fills in the required information The entered data is displayed on the screen.
3 User clicks on the “Submit” button The details of the user are stored in the
database and the system will verify that the
name of organization provided by the user
exists. In case of correct credentials, a new
screen appears to the user so that he/she
can access other features. If the
organization is not registered, an error
message prompts and the user is requested
to enter the data again.

4.2.3 Functional Requirements


The functional requirements associated with this feature are as follows:

S. No. Requirement Requirement Name Description Input Results


ID
1 REQ-1 Complete Data The details entered by the An empty field
user must be complete will generate an
error message
2 REQ-2 Correct Data The data entered must be If the data is
correct. correct, the
previous screen
appears to the
user so that
he/she can login
to the
application. If the
data is incorrect,
an error message
prompts and the
user is requested
to enter the data
again
3 REQ-3 Data Type The username, password, If any other data
name of organization, city, type is entered,
email address and address an error message
fields accepts only strings will be generated
and the gender field accepts
only characters.
4 REQ-4 Security The browser should not The page must
cache the sign up page be treated as
sensitive data
and the browser
must be told not
to cache the

23
page

5 REQ-5 Uniqueness Each user name should be If a user name


unique already exists, an
error message
will be prompted
and user will be
requested to
choose another
user name
6 REQ-6 Password Length Passwords should consist of If the
minimum 8 characters requirements are
including numbers and not fulfilled, an
symbols error message
will be generated
and the user will
be asked to set
up a new
password.

4.3 Submit Map Reduce Job


4.3.1 Description and Priority
This feature allows the users to submit the respective Map Reduce jobs to the application and
retrieve the results of the respective MR Job. Since the Map Reduce Job submission is the basic
purpose for which the application is being designed, the priority of this feature is high. Other set of
priorities of this feature are as follows:

4.3.1.1 Benefit
This feature is very important as it would let the users submit their respective Map reduce jobs to
the application and get the result of their MR Job. On a relative scale of 1 to 9, this feature will be
rated 9 because the priority for benefit is 9.

4.3.1.2 Penalty
This feature is extremely important in terms of penalty because if this feature doesn’t exist there
would be no purpose of the system. On a relative scale of 1 to 9, this feature will be rated 9 because
the priority for penalty is 9.

4.3.1.3 Cost
This feature will contain an access to the server which will check the availability of data in clusters,
run the MR Job on the clusters and reduce the results obtained from the clusters. It will produce the
final result and save it. This feature contains an access to the clusters as well. The cost to implement
this feature is high so the priority for cost is somewhere between 8 and 9.

24
4.3.1.4 Risk
The risk that would be imposed on the system without this feature is huge because without this
feature, the system will fail to serve its purpose. Also, this feature is hard to implement. The
technical and other risk associated with this feature is high. On a relative scale of 1 to 9, the priority
for risk is somewhere between 8 and 9.

4.3.2 Stimulus/Response Sequences


A user can submit their Map Reduce job by clicking on the “Submit MR Job” button available on the
screen that opens up after user log in. When the user clicks on the button, a form opens up where
the user uploads his/her Map Reduce Job and specifies the path to the input file. After filling in the
form, the user can then click on the “Submit” button on the user interface of the system to submit
the Map Reduce Job. The sequence of user actions and system responses that stimulate the
behavior defined for this feature is as follows:

S. No Stimulus Response
1 User clicks on “Submit MR Job” button A new screen is opened up and a form is
generated for the user to fill.
2 User fills in the required information The entered data is displayed on the screen.
3 User clicks on the “Submit” button The MR Job and the name of the input file is
sent to the server for further actions. In case
of any missing data or incorrect data, an
error message prompts and the user is
requested to enter the data again.

4.3.3 Functional Requirements


The functional requirements associated with this feature are as follows:

S. No. Requirement Requirement Name Description Input Results


ID
1 REQ-1 Complete Data The details entered by the An empty field
user must be complete will generate an
error message
2 REQ-2 Correct Data The data entered must be If the data is
correct. correct, the user
will be able to go
to the View
Result screen. If
the data is
incorrect, an
error message
prompts and the
user is requested
to enter the data
again

25
3 REQ-3 Data Type The MR Job field accepts In case of any
only a file and the input file other data type,
name field accepts only an error message
strings. will be generated

4.4 View Results


4.4.1 Description and Priority
This feature allows the users to view results of the submitted Map Reduce Job. The user is required
to have provided all the relevant data beforehand for the results to be prepared and displayed. Since
the view results feature is a very important feature, because it helps carry out the basic purpose for
which the application is being designed, the priority of this feature is high. Other set of priorities of
this feature are as follows:

4.4.1.1 Benefit
This feature is very important as it would let the users view the results of their submitted Map
reduce jobs to the application. On a relative scale of 1 to 9, this feature will be rated 9 because the
priority for benefit is 9.

4.4.1.2 Penalty
This feature is extremely important in terms of penalty because if this feature doesn’t exist the user
wouldn’t be able to view the result of their submitted Map Reduce jobs and the system would not
fulfill its basic purpose. On a relative scale of 1 to 9, this feature will be rated 9 because the priority
for penalty is 9.

4.4.1.3 Cost
This feature will contain an access to the server to get the final result and view it on the screen. The
cost to implement this feature is not very high so the priority for cost is somewhere between 4 and
5.

4.4.1.4 Risk
The risk that would be imposed on the system without this feature is huge because without this
feature, the system will fail to serve its purpose. Also, this feature is not very hard to implement. The
technical and other risk associated with this feature is not very high. On a relative scale of 1 to 9, the
priority for risk is somewhere between 4 and 5.

4.4.2 Stimulus/Response Sequences


A user can view the results of their submitted Map Reduce jobs by clicking on the “View Result”
button available on the screen that opens up after user submits the Map reduce job. When the user
clicks on the button, a new screen opens up where the result of the user submitted Map Reduce Job
is displayed. The user can perform actions on the result e.g. the users can save the result by clicking
on the “Save Result” button and print the result by clicking on the “Print Result” button. The
sequence of user actions and system responses that stimulate the behavior defined for this feature
is as follows:

26
S. No Stimulus Response
1 User clicks on “View result” button A new screen is opened up and the result is
viewed.
2 User clicks on “Save Result” button The result of the Map Reduce Job is stored
in the database so that the user can view it
later.
3 User clicks on the “Print Result” button The system is connected to the available
printers so that the user is able to print the
result

4.4.3 Functional Requirements


The functional requirements associated with this feature are as follows:

S. No. Requirement Requirement Name Description Input Results


ID
1 REQ-1 Complete and The data provided before In case of
Correct Entry this step should be complete complete and
and correct correct entry,
data will be
retrieved from
the server and
the result will be
viewed on the
screen.

4.5 Edit Details

4.5.1 Description and Priority


This feature allows the users to edit the details he/she provided at the time of sign up. This feature
lets the users change their credentials, name of organization and other details. Since the edit details
feature is an important part of the system, the priority of this feature is high. Other set of priorities
of this feature are as follows:

4.5.1.1 Benefit
This feature is important as it would let the users change the details they provided when they
registered to the application. It lets the users keep their accounts safe by changing their passwords
and provides them the facility of changing the name of organization if the organization’s name is
changed physically. On a relative scale of 1 to 9, this feature will be rated 9 because the priority for
benefit is 8.

4.5.1.2 Penalty
This feature is important in terms of penalty because if this feature doesn’t exist any user would not
be able to change the name of organization if it changed physically and the user will not be able to

27
change the details that have been changed about the organization. On a relative scale of 1 to 9, this
feature will be rated 9 because the priority for penalty is 9.

4.5.1.3 Cost
This feature will contain an access to the database to update the data provided by the user. It will
also cross check that the organization specified by the user is a registered organization or not. The
cost to implement this feature is not very high so the priority for cost is somewhere between 2 and
3.

4.5.1.4 Risk
The risk that would be imposed on the system without this feature is high because without this
feature, the system would not be able to provide a basic facility to the users. Also, this feature is not
very hard to implement. The technical and other risk associated with this feature is not too much.
On a relative scale of 1 to 9, the priority for risk is somewhere between 2 and 3.

4.5.2 Stimulus/Response Sequences


A user can edit the details he provided to the application by clicking on the “Settings” button
available on the screen that opens up after login. Different options will be available to the user, if
he/she wishes to change the credentials, they can click on the “Change Password” button and if the
user wants to edit the other details, he/she can do it by clicking on the “Edit Details” button. The
user can the update the required information and click on the “Submit” button to save the
information. The system will then update the information entered by the user in the database and
verify that the name of organization provided by the user exists. The sequence of user actions and
system responses that stimulate the behavior defined for this feature is as follows:

S. No Stimulus Response
1 User clicks on “Settings” button A new screen is opened up and different
options are available.
2 User clicks on the “Change Password” A form opens up letting the user change
button his/her password.
3 User provides the old password and then The entered data is displayed on the screen.
sets up a new password.
4 User clicks on the “Submit” button The credentials are updated in the
database.
5 User clicks on the “Edit Details” button A form opens up letting the user change
his/her basic details.
6 User fills in the form by providing the The entered data is displayed on the screen.
details
7 User clicks on the “Submit” button The details of the user are updated in the
database.

4.5.3 Functional Requirements


The functional requirements associated with this feature are as follows:

S. No. Requirement Requirement Name Description Input Results


ID

28
1 REQ-1 Correct Data The data entered must be If the data is
correct. incorrect, an
error message
prompts and the
user is requested
to enter the data
again
2 REQ-2 Data Type The username, password, If any other data
name of organization, city, type is entered,
email address and address an error message
fields accepts only strings will be generated
and the gender field accepts
only characters.
3 REQ-3 Security The browser should not The page must
cache the change password be treated as
page sensitive data
and the browser
must be told not
to cache the
page

4 REQ-4 Password Length Passwords should consist of If the


minimum 8 characters requirements are
including numbers and not fulfilled, an
symbols error message
will be generated
and the user will
be asked to set
up a new
password.

4.6 User Logout


4.6.1 Description and Priority
This feature allows the users to log out of the application and protect their data till the next time
they log in to the application. Since the user logout is an important feature for the user if he/she
wishes to secure his/her data until the next time they log in, the priority of this feature is high. Other
set of priorities of this feature are as follows:

4.6.1.1 Benefit
This feature is very important as it would let the users secure their respective accounts from
unauthenticated users. On a relative scale of 1 to 9, this feature will be rated 9 because the priority
for benefit is 9.

29
4.6.1.2 Penalty
This feature is extremely important in terms of penalty because if this feature doesn’t exist any
malicious user would be able to use a person’s account by getting access to his/her computer system
which will affect the security of the system. On a relative scale of 1 to 9, this feature will be rated 9
because the priority for penalty is 9.

4.6.1.3 Cost
This feature will not contain access to any other module. The cost to implement this feature is not
very high so the priority for cost is somewhere between 2 and 3.

4.6.1.4 Risk
The risk that would be imposed on the system without this feature is huge because without this
feature, the system would not be able to keep the malicious users away from using someone else’s
account. Also, this feature is not very hard to implement. The technical and other risk associated
with this feature is not too much. On a relative scale of 1 to 9, the priority for risk is somewhere
between 2 and 3.

4.6.2 Stimulus/Response Sequences


A user can log out of the application by clicking on the “Logout” button on the user interface of the
system. When the user clicks on the button, the login screen will be shown to the user. The
sequence of user actions and system responses that stimulate the behavior defined for this feature
is as follows:

S. No Stimulus Response
1 User clicks on the “Logout” button If the user clicks on the button their session
on the website will terminate. A login
screen will be shown to the user.
4.6.3 Functional Requirements
The functional requirements associated with this feature are as follows:

S. No. Requirement Requirement Name Description Input Results


ID
1 REQ-1 Removal of session When logging out a user the Any data other
data session data will be than the one that
removed. is saved by the
user on purpose
will be deleted
when user logs
out.
2 REQ-2 Available Logout The system shall provide a
facility mechanism for logged in
users to log out of the
system.

30
5. Other Nonfunctional Requirements

5.1 Performance Requirements


The performance requirements for Collaborative Hadoop are as follows:

31
 The application shall take initial load time depending on the internet connection strength
which also depends on the media from which the application is running.
 The performance of the system shall depend upon hardware components of the user.
 Collaborative Hadoop shall be a web based application and has to be run from a web
browser.
 Collaborative Hadoop shall be available to the user round-the-clock.
 The application shall have an easy to use interface.
 The system shall be available in English language.
 The application shall be able to validate user actions.
 The database connected to the application shall be updated in real-time.
 The application shall allow the users to utilize their resources efficiently.
 The server shall be able to handle any number of users concurrently.
 The user shall be able to log out from any screen.
 If the users stays inactive for more than 30 minutes, the system automatically logs them out
of session.

The above mentioned performance requirements apply to all the features of the system, except for
the last mentioned requirement which only applies to the User logout feature.

5.2 Safety Requirements


The safety requirements for Collaborative Hadoop are as follows:

 The application shall ensure that the user is registered with the system before him/her log
in.
 The application shall let the user set up a new password if he/she has forgotten the
password.
 A backup of the data shall be kept in case of data loss or damage.
 The system shall not cause any mishaps.
 The application will make sure that the account details of a user stays between the user and
the application.
 The client and the server shall communicate over a secure channel.
 The application should keep the database that stores the user’s data safe.
 Large files shall be split up in to smaller chunks and then sent through a secure channel to
avoid bandwidth throttling.

32
The above mentioned requirements apply to the entire system. In Collaborative Hadoop, a number
of security certificates can be used to satisfy the safety of the application:

 Hyper Text Transfer Protocol Secure (HTTPs) indicates that the website is protected by
Secure Socket Layer/Transport Layer Security.
 A third-party called a Certificate Authority (CA) to verify that our web application is
authentic.

5.3 Security Requirements


The security requirements for Collaborative Hadoop are as follows:

 If the users stays inactive for more than 30 minutes, the system shall automatically log out.
 The system shall not leave any cookies on the user’s computer containing the user’s
password.
 The application shall generate an email to the registered user in case of three wrong
attempts of the password.
 The system shall not leave any cookies on the user’s computer containing any of the user’s
confidential information.
 The user’s web browser shall never display his/her password. It shall be displayed as
asterisks on the screen.
 The application’s back-end database shall be encrypted.

 Users need to be authenticated before having access to any data or feature.


 When logging out a user the session data will be removed.
 The application must maintain logs of the user activities.
 Any services that are not used by the web server or applications shall be disabled.
 The user login shall be implemented which will provide a mechanism for user identity
authentication.
 Map reduce jobs shall be uploaded using a secure channel.
 Session IDs shall be long, complicated and unpredictable.

The above mentioned security requirements apply to the entire system. In Collaborative Hadoop, a
number of security certificates can be used to satisfy the security of the application:

 SSL certificates

33
 Hyper Text Transfer Protocol Secure (HTTPs) indicates that the website is protected by
Secure Socket Layer/Transport Layer Security.
 A third-party called a Certificate Authority (CA) to verify that our web application is
authentic.

5.4 Software Quality Attributes


The software quality attributes of Collaborative Hadoop are as follows:

5.4.1 Adaptability
Collaborative Hadoop application shall be adaptable to the user’s needs and business’ needs and
requirements and to any future modifications and changes.

5.4.2 Availability
The application shall be available to the users round-the-clock and it should be able to handle
multiple users concurrently. The application will only be available when there is an active internet
connection.

5.4.3 Correctness
The system should be correct and fulfill all the requirements of the users. The application should not
have any defects or errors.

5.4.4 Flexibility
The application should be flexible and should be easily modified with change in time and technology.
The procedure to make changes in the software should not very hard to implement.

5.4.5 Interoperability
The system will be operable using any web browser if the user has an active internet connection.
The application will connect to the database and the clusters to carry out the functionality.

5.4.6 Maintainability
Any software developer with a little or more experience shall be able to fix any defects in the system.
The application shall be very easy to maintain and the maintenance team shall be able to retain the
software effortlessly.

5.4.7 Portability
The application shall run on any web browser in the presence of active internet connection. The
application shall be portable because it is a web application which is platform or operating system
independent.

34
5.4.8 Reliability
The application shall be available to the user day-and-night. The system shall be reliable and will
never crash. The system shall also maintain back up of the database in case of the database failure
and data loss.

5.4.9 Reusability
The modules of the application shall be created in such a way that there is minimal coupling
between them. The modules of the application shall be able to be reused in some other application
with minimum adjustments. The security of the system shall be upgradable.

5.4.10 Robustness
In case that the application cannot connect to the database or any other module, the application
process shall not crash or mutate to an ever loading position and it shall display an error message.
The system shall not crash or terminate in case of bad or invalid input.

5.4.11 Testability
The application shall be designed in a way that each of the modules of the system are testable. All
these modules shall be tested individually and then integrated to create the final application.

5.4.12 Usability
This application shall be usable by anyone with a web browser and an active internet connection.
The system shall provide a uniform look and feel between all the web pages and provide use of icons
and toolbars.

5.5 Business Rules


There are only one type of users of Collaborative Hadoop application which are the personnel of
registered organizations. The users need to register to the application by providing basic details
about their organizations and if the user’s information is verified and accepted then the users can
log in to the application and enjoy the features of the application. All the users will be able to
perform the same functions.

The business rules for Collaborative Hadoop are as follows:

 The user must be a part of a registered organization.


 The user must sign up to the application.
 The users must have a valid and working email address.
 User’s log in credentials must be provided by the personnel of the organizations.
 The details provided by the user must be accurate and verifiable.
 A user can update details or submit MR Job from his/her own account only.

35
 The results of the Map Reduce jobs submitted by a user can be viewed by that user only.

6. Other Requirements
The other requirements which were not mentioned in the above sections are as follows:

6.1 Database Requirements


The application must ensure that that the user’s information is encrypted and safely stored in a
database. A database for Collaborative Hadoop calls for a server side implementation that holds
information of the users. The information of users stored in database is: user ID, user name and
password, name of organization, gender, address, city and email address.

36
6.2 Legal Requirements
The system must be developed by keeping in mind the legal technological market standards.
Copyright laws and license agreements must be respected for any third party software used in the
creation of this application.

6.3 Reuse Objectives


The modules of the application shall be able to be reused in some other application with minimum
adjustments. The modules of the application shall be upgraded as well according to the change in
time and technology.

Appendix A: Glossary

Terms, Acronyms and Abbreviations Meaning

Authentication The process or action of proving or showing


something to be true, genuine, or valid.
Total maximum transfer rate of a network cable

37
Bandwidth or device. It is a measurement of how fast data
can be sent over a wired or wireless
connection, measured in bits per second.
Community Enterprise Operating System -
Linux distribution that attempts to provide a
CentOS free, enterprise-class, community-supported
computing platform functionally compatible
with its upstream source, Red Hat Enterprise
Linux.
A cluster consists of a set of loosely or tightly
Cluster connected computers that work together so
that they can be viewed as a single system.
The exclusive legal right, given to an originator
Copyright or an assignee to print, publish, perform, film,
or record literary, artistic, or musical material,
and to authorize others to do the same.
A collection of information that is organized so
Database that it can be easily accessed, managed and
updated.
The qualitative and quantitative techniques and
processes used to enhance productivity and
business gain. Data is extracted and categorized
Data Analytics to identify and analyze behavioral data and
patterns, and techniques vary according to
organizational requirements.
A collection of related sets of information that
Data sets is composed of separate elements but can be
manipulated as a unit by a computer.
Hadoop is an open source, Java-based
programming framework that supports the
Hadoop processing and storage of extremely large data
sets in a distributed computing environment.
Human Computer Interaction - The study of
how people interact with computers and to
HCI what extent computers are or are not
developed for successful interaction with
human beings.
Hyper Text Transfer Protocol - The underlying
protocol used by the World Wide Web and this
HTTP protocol defines how messages are formatted
and transmitted, and what actions Web servers
and browsers should take in response to
various commands.
Internet Protocol - Protocol by which data is
IP sent from one computer to another on
the Internet.
A MapReduce job usually splits the input data-
set into independent chunks which are
processed by the map tasks in a completely
Map Reduce job parallel manner. The framework sorts the
outputs of the maps, which are then input to

38
the reduce tasks.
A free and open source cross
MongoDB platform document oriented database program
which uses JSON-like documents with schemas.
Map reduce - A programming model and an
associated implementation for processing and
MR generating big data sets with
a parallel, distributed algorithm on a cluster.
The data freely available to everyone to use
and republish as they wish, without restrictions
Open Data from copyright, patents or other mechanisms
of control.
The technique of developing a rough sketch of
the interface to get an idea about how the
Prototype output will look like. It also helps in making
future design decisions.
A function that can iterate through the values
Reduce that are obtained from the systems and
produce zero or more outputs.
A session is a semi-permanent interactive
information interchange, also known as a
Session dialogue, a conversation or a meeting, between
two or more communicating devices, or
between a computer and user 
Slave Nodes Slave nodes are where Hadoop data is stored
and where data processing takes place
Spring An application framework and inversion of
control container for the Java platform.
Software Requirements Specification – A
SRS document containing the description of
a software system to be developed.
Secure Sockets Layer - A standard security
SSL technology for establishing an encrypted link
between a server and a client - typically a web
server (website) and a browser.
A set of conditions or variables under which a
Test Case tester will determine whether a system under
test satisfies requirements or works correctly.
A Debian-based Linux operating
system for personal
Ubuntu Linux computers, tablets and smartphones, which
also runs on network servers, usually with
the Ubuntu Serveredition or with containers.
Web Browser A software application for retrieving, presenting
and traversing information resources on
the World Wide Web.

39
Appendix B: Analysis Models and Other References

Use Case Diagram

40
Figure 9-Use Case Diagram

Component Diagram

41
Figure 10-Component Diagram

Package Diagram

Figure 11-Package Diagram

Object Diagram

42
Figure 12-Object Diagram

Sequence Diagram

43
Figure 13-Sequence Diagram

Communication Diagram
Login

Figure 14-Communication Diagram 1


Editing User Details

44
Figure 15-Communication Diagram 1

Run MR Job

Figure 16-Communication Diagram 2

45
View Result

Figure 17-Communication Diagram 3

Deployment Diagram

Figure 18-Deployment Diagram

46
Activity Diagrams
User Login

Figure 19-Activity Diagram 1

47
Editing User Details

Figure 20-Activity Diagram 2

48
Run MR Job

Figure 21-Activity Diagram 3

49
View Results

Figure 22-Activity Diagram 4

50
Architecture Diagram

Figure 23-Architecture Diagram

51

You might also like