Professional Documents
Culture Documents
For
Collaborative Hadoop
Version 1.1
Arooj Sajid
Usama Bin Tariq
2
4.1.3 Functional Requirements....................................................................................................20
4.2 User Sign Up..............................................................................................................................21
4.2.1 Description and Priority......................................................................................................21
4.2.2 Stimulus/Response Sequences...........................................................................................22
4.2.3 Functional Requirements....................................................................................................23
4.3 Submit Map Reduce Job............................................................................................................24
4.3.1 Description and Priority......................................................................................................24
4.3.2 Stimulus/Response Sequences...........................................................................................25
4.3.3 Functional Requirements....................................................................................................25
4.4 View Results...............................................................................................................................26
4.4.1 Description and Priority......................................................................................................26
4.4.2 Stimulus/Response Sequences...........................................................................................26
4.4.3 Functional Requirements....................................................................................................27
4.5 Edit Details.................................................................................................................................27
4.5.1 Description and Priority......................................................................................................27
4.5.2 Stimulus/Response Sequences...........................................................................................28
4.5.3 Functional Requirements....................................................................................................29
4.6 User Logout...............................................................................................................................29
4.6.1 Description and Priority......................................................................................................29
4.6.2 Stimulus/Response Sequences...........................................................................................30
4.6.3 Functional Requirements....................................................................................................30
5. Other Nonfunctional Requirements........................................................................................31
5.1 Performance Requirements.......................................................................................................31
5.2 Safety Requirements.................................................................................................................32
5.3 Security Requirements...............................................................................................................33
5.4 Software Quality Attributes.......................................................................................................34
5.4.1 Adaptability........................................................................................................................34
5.4.2 Availability..........................................................................................................................34
5.4.3 Correctness.........................................................................................................................34
5.4.4 Flexibility.............................................................................................................................34
5.4.5 Interoperability...................................................................................................................34
5.4.6 Maintainability....................................................................................................................34
5.4.7 Portability...........................................................................................................................34
5.4.8 Reliability............................................................................................................................34
5.4.9 Reusability..........................................................................................................................35
5.4.10 Robustness........................................................................................................................35
3
5.4.11 Testability.........................................................................................................................35
5.4.12 Usability............................................................................................................................35
5.5 Business Rules............................................................................................................................35
6. Other Requirements...............................................................................................................36
6.1 Database Requirements............................................................................................................36
6.2 Legal Requirements...................................................................................................................37
6.3 Reuse Objectives........................................................................................................................37
Appendix A: Glossary......................................................................................................................37
Appendix B: Analysis Models and Other References.......................................................................40
Use Case Diagram............................................................................................................................40
Component Diagram.......................................................................................................................42
Package Diagram.............................................................................................................................42
Object Diagram................................................................................................................................43
Sequence Diagram...........................................................................................................................43
Communication Diagram.................................................................................................................44
Login............................................................................................................................................44
Editing User Details......................................................................................................................44
Run MR Job..................................................................................................................................45
View Result..................................................................................................................................45
Deployment Diagram.......................................................................................................................46
Activity Diagrams.............................................................................................................................46
User Login....................................................................................................................................46
Editing User Details......................................................................................................................47
Run MR Job..................................................................................................................................48
View Results................................................................................................................................49
Architecture Diagram......................................................................................................................50
Table of Figures
Figure 1 - Block Diagram------------------------------------------------------------------------------------------------- 10
Figure 2 - System Environment Diagram-----------------------------------------------------------------------------11
Figure 3-Class Diagram--------------------------------------------------------------------------------------------------- 12
4
Figure 4-Upload MR Page----------------------------------------------------------------------------------------------- 15
Figure 5-MR Submission Page------------------------------------------------------------------------------------------ 16
Figure 6-Success Message----------------------------------------------------------------------------------------------- 16
Figure 7-Execute MR Page----------------------------------------------------------------------------------------------- 17
Figure 8-Success Message----------------------------------------------------------------------------------------------- 17
Figure 9-Use Case Diagram---------------------------------------------------------------------------------------------- 42
Figure 10-Component Diagram---------------------------------------------------------------------------------------- 43
Figure 11-Package Diagram--------------------------------------------------------------------------------------------- 43
Figure 12-Object Diagram----------------------------------------------------------------------------------------------- 44
Figure 13-Sequence Diagram------------------------------------------------------------------------------------------- 45
Figure 14-Communication Diagram 1---------------------------------------------------------------------------------45
Figure 15-Communication Diagram 1---------------------------------------------------------------------------------46
Figure 16-Communication Diagram 2---------------------------------------------------------------------------------46
Figure 17-Communication Diagram 3---------------------------------------------------------------------------------47
Figure 18-Deployment Diagram---------------------------------------------------------------------------------------- 47
Figure 19-Activity Diagram 1-------------------------------------------------------------------------------------------- 48
Figure 20-Activity Diagram 2-------------------------------------------------------------------------------------------- 49
Figure 21-Activity Diagram 3-------------------------------------------------------------------------------------------- 50
Figure 22-Activity Diagram 4-------------------------------------------------------------------------------------------- 51
Figure 23-Architecture Diagram--------------------------------------------------------------------------------------- 52
Revision History
5
Arooj Sajid and
Usama Bin Tariq 28th January, 2017 Initial version Version 1.0
Functional/Nonfunctional
Arooj Sajid and
Requirements and others
Usama Bin Tariq 9th February, 2017 errors Version 1.1
1. Introduction
Hadoop is a highly scalable data storage and analytics platform for processing huge volumes of
structured and unstructured data. Petabytes of data is spread across hundreds or thousands of
physical storage nodes. Organizations can have a significant amount of Open Data, these data sets
6
are available for public usage and/or analysis. A researcher, having a typical Hadoop cluster,
planning to run a Map Reduce job on the data sets would be faced with some limitations like the
organization’s clusters could be inaccessible to the external party because of certain constraints such
as unreachability and/or limited resources incapable of processing larger data sets. Assuming that
sufficient resources are available, the datasets would have to be stored on the relevant cluster to
run the MR job, this could be done by copying the data sets either physically or digitally but the time
taken to retrieve the data may not be feasible.
The problem is the inability to analyze open data-sets due to resource constraints and alternatives to
get better resources are more time consuming and/or less cost effective. The main goal of
Collaborative Hadoop is to enable users to analyze open data sets within the providing entity’s
Hadoop cluster and to avoid copying the data-sets to user’s Hadoop cluster. Collaborative Hadoop
lets the researcher run the Map Reduce job on the idle slave nodes of the clusters containing the
data sets instead of copying the data sets on its own cluster to save time and for efficient resource
utilization.
1.1 Purpose
This Software Requirements Specification (SRS) document (version 1.0) describes in detail the
requirements and specifications of the software system being developed i.e. Collaborative Hadoop,
which will be a web application to allow researchers analyze data at its stored location and only
retrieve the results.
It also highlights the design plan (i.e. the database system for keeping track of the clusters set up at
respective organizations and record of where the respective data sets are stored and the interface
front-end for user interaction with the web application), the basic tasks, the probable scenarios and
the constraints and limitations of the application.
The purpose of this document is to present a detailed description of Collaborative Hadoop. It will
explain the features and benefits of the system, the structure of the system, what the system will do,
the constraints under which it must operate and how the system will react to external stimuli. This
document is intended for both the stakeholders and the developers of the system.
7
1.2 Document Conventions
1.2.1 Formatting Conventions
The font used throughout this Software Requirements Specification (SRS) document for headings is
Times New Roman and font for normal text is Calibri. The font size for the normal text is 12, for the
section headings it is 18, for the second level headings it is 14 and for the third level headings it is 12.
All the headings, irrespective of their level, are written in bold. The line and paragraph spacing for
normal text is 1.15 and the line and paragraph spacing for the text in bullets is 1.5. Important
information has been written in bold. The whole text has been justified. In general, all the IEEE
requirements for formatting have been followed.
Apart from the aforementioned audience, this document is also meant for the software developers
to learn how to develop the application and to understand the structure of the product.
Furthermore, this document is for the project managers to help them understand how to manage
the teams involved.
In addition, it is intended for the sales and marketing team to understand the functionality of the
product and develop marketing techniques accordingly.
This document is also written for the testing team to help them prepare test cases and allow them to
debug the software.
Lastly, this SRS document is written for the maintenance staff who might modify the Collaborative
Hadoop application in the future based on the changing requirements.
8
Organizations aiming to use open data sets would benefit from Collaborative Hadoop as they
manage to analyze data at its stored location and only retrieve the results. Education Commissions,
Academics, Governments, Research Institutes and Medical Industries in our country could all gain to
benefit from Collaborative-Hadoop. These organizations tend to have multiple branches at various
locations with data split amongst these locations. These multiple locations could have respective
clusters deployed to meet their data analytics and processing needs. Collaborative-Hadoop would
ease the process of using a cluster at another branch to retrieve results based on data at that
branch. The goal of Collaborative Hadoop is to enable organizations to analyze open data sets within
the providing entity’s Hadoop cluster and to avoid copying the data sets to their own Hadoop
cluster.
More specifically, the software will facilitate communication between organizations, enables them
to run their respective Map Reduce jobs on the idle slave nodes of the clusters containing the data
sets without replicating the data to their own cluster and permits them to save time and utilize the
available resources efficiently. The system also contains a database for keeping track of where the
respective data sets are stored, the clusters set up at the organizations and the idle slaves nodes
available in the respective clusters.
1.5 References
This Software Requirements Specification (SRS) document has been written according to IEEE
standard. The IEEE Standard used is 830-1998 IEEE Recommended Practice for Software
Requirements Specifications. IEEE Computer Society, 1998.
2. Overall Description
.
9
2.1 Product Perspective
Collaborative Hadoop is a new and self-contained product being developed for individuals and/or
organizations to assist them run data analytics. It is solely to be designed and developed from the
user requirements and specifications.
Collaborative Hadoop will consist of four major modules which includes user login, MR Job
submission, connection to clusters and result retrieval. Users can log in to the application on their
own device by providing their credentials. On successful login, users can upload their Map Reduce
jobs onto the server. After the MR Job is submitted, the MR Job submission module will connect to
the relevant clusters and send the MR Job along with the path to the input file. Then, the clusters
will run the respective MR Job and send their results to the result retrieval module. The result
retrieval module will reduce the results obtained from the clusters to produce a final result and send
it back to the user.
The overall system modules have been shown as a block diagram given below:
The overall system have been shown as a system environment diagram given below:
10
Figure 2 - System Environment Diagram
Sign Up
Login
Submit Map Reduce Jobs
Runs Map Reduce job on clusters containing required data
Reduce the results obtained from the clusters
Produces final result
View Results
Log out
The users can login to the application by providing their credentials, i.e. user ID and password. If the
user is not already registered with the application, he/she can sign up by filling in a form containing
some basic information. The user can submit a Map Reduce job along with the path to the input file
to the application and view the result of the submitted Map Reduce job. The user can also log out
from the application.
The application also provides some functions which are hidden from the user. The application
connects to the clusters with relevant data to run the Map Reduce Job of the user. The application
also reduces the results obtained from the clusters to produce a final result that is viewed by the
user.
The product functions have been shown using a class diagram given below:
11
Figure 3-Class Diagram
2.3.1 Organizations
The individuals in any organization can log in to use the application. If the organization is not already
registered with the application, the personnel can also sign up by filling in a form containing some
basic information about the organization like name, email address, city, address etc. They can also
view and update their information and credentials. They can submit their Map Reduce jobs along
with the input file name to the application. The application will then send the command to run the
MR Job, reduce the multiple results obtained from clusters and save the final result. The individuals
in any organization can view the result of the submitted Map Reduce job depending on their level of
clearance. They can also log out from the application.
12
2.5 Design and Implementation Constraints
The design and implementation of the Collaborative Hadoop application requires a number of
constraints that must be observed. Some of these constraints are as follows:
The system must be developed in accordance to the policies of the organizations which will
be using the application.
Only registered organizations are able to sign up to the Collaborative Hadoop application.
There must be an active Internet connection to use the application.
Hadoop clusters must be up and running before the application can access them.
For the development of the application, the spring MVC framework must be used.
MongoDB must be used as the database management system.
The application must be connected to the database when the user wants to add, view or
update their data
For the testing of the application, any web browser can be used.
The user must be authenticated before being provided with any of the application’s
functionality.
The users will be provided with a user manual in the help section of the application. This will contain
information about various functionalities and how a user can implement them. For users to have a
better understanding, pictures and screenshots will also be provided along with the step wise details
regarding various components.
Along with the written user manual, video tutorials will also be available to the users in the help
section of the application. Users will also be able to contact the developing team in case of any
technical difficulties or queries. The developing team’s contact information will be available in the
“Contact Us” section of the application.
A few known user documentation standards are IEEE standard 12207-2008, MIL-STD-498 standard
and ISO standard 01.110: Technical product documentation.
13
2.7 Assumptions and Dependencies
For the development of the application, the assumptions and the dependencies the system has are
as follows:
14
3.1 User Interfaces
The user interface of the application will be as easy to use as possible. The components of the
application should be self-explanatory, the aim is to help the users recognize the interface elements
rather than recall. The interfaces will be implemented according to the standards stated by the HCI.
The user interface for the software shall be compatible to any web browser such as Internet
Explorer, Mozilla, Google Chrome, Safari, and Netscape Navigator etc. by which user can access the
system.
The sample screens from the initial prototype are shown below:
15
Figure 5-MR Submission Page
16
Figure 7-Execute MR Page
17
3.2 Hardware Interfaces
The hardware interfaces required are as follows:
3.3.1 Database
Collaborative Hadoop will be linked with a database which will be created using MongoDB. The
application will communicate with the database to carry out the tasks like storing, updating and
retrieving user’s data and keeping track of the information related to the clusters.
3.3.3 Tools
The tools to be used for the development of the application are Spring MVC, JSP, HTML, CSS, and
Apache Hadoop.
When a user opens up the application on their web browser, the application checks for the internet
connection and establishes a connection with the database. In case of inactive internet connection,
the application doesn’t load and prompts an error message. When the connection is established, the
log in window of the application launches. The user then enters their login details i.e. username and
password and clicks on the log in button. When the button is pressed, application contacts the
database to verify the user which in return tells where the user is authenticated or not. If the login
details are incorrect, an error message is prompted. Similarly, when a user registers to the
application, the entered details are taken from the user interface to the database where they are
stored as a record using the specified query.
In case of incorrect login details or other kind of violation, an error message is prompted which helps
the user to understand what was expected of them. The creation of these error messages is based
on the data stored in the database. The client can access the data stored if the database connection
exists and the user has been authenticated.
18
In case of sending a Map Reduce job, the user specifies the path to input file as well. This input file is
looked in multiple clusters on which the MR job later runs.
3.4.4 HTTP
The Collaborative Hadoop application will use the HTTP protocol for the communication over the
internet.
19
4. System Features
Our application allows the users run data analytics and provides them with a mechanism to submit
their Map Reduce job. The application also provides the user with the facilities of login, sign up, view
their data and edit data. For better understanding, we divided each of these functionalities into
separate system features, giving a detailed explanation of each.
4.1.1.1 Benefit
This feature is very important as it would separate unauthenticated users from the authenticated
users. This feature will keep the malicious users away from the system and it would allow the
maintenance of each user uniquely and securely. On a relative scale of 1 to 9, this feature will be
rated 9 because the priority for benefit is 9.
4.1.1.2 Penalty
This feature is extremely important in terms of penalty because if this feature doesn’t exist any
malicious user would be able to use the application which will affect the security of the system. On a
relative scale of 1 to 9, this feature will be rated 9 because the priority for penalty is 9.
4.1.1.3 Cost
This feature will contain an access to the database to validate the data provided by the user. The
cost to implement this feature is not very high so the priority for cost is somewhere between 2 and
3.
4.1.1.4 Risk
The risk that would be imposed on the system without this feature is huge because without this
feature, the system would not be able to keep the malicious users away. Also, this feature is not very
hard to implement. The technical and other risk associated with this feature is not too much. On a
relative scale of 1 to 9, the priority for risk is somewhere between 2 and 3.
20
S. No Stimulus Response
1 User enters the username and password The entered data is displayed on the screen.
The password is hidden using asterisks.
2 User clicks on the “Login” button The data entered is verified using the stored
data in the database. In case of correct
credentials, a new screen appears to the
user so that he/she can access other
features. In case of incorrect credentials, an
error message prompts and the user is
requested to enter the data again.
21
4.2 User Sign Up
4.2.1 Description and Priority
This feature allows the users to sign up to the application and access the rest of the features that the
application provides. To sign up to the application, a user must be part of a registered organization.
Since the user sign up is a very important feature for the users, because it lets the users register to
the application and enjoy the functionalities provided by the system, the priority of this feature is
high. Other set of priorities of this feature are as follows:
4.2.1.1 Benefit
This feature is very important as it would let the users register to the application and enjoy the
features of the system. It also plays a role in separating unauthenticated users from the
authenticated users. On a relative scale of 1 to 9, this feature will be rated 9 because the priority for
benefit is 9.
4.2.1.2 Penalty
This feature is extremely important in terms of penalty because if this feature doesn’t exist any user
without a registered organization would be able to register to the application which will affect the
reliability of the system. On a relative scale of 1 to 9, this feature will be rated 9 because the priority
for penalty is 9.
4.2.1.3 Cost
This feature will contain an access to the database to store the data provided by the user. It will also
cross check that the organization specified by the user is a registered organization or not. The cost to
implement this feature is not very high so the priority for cost is somewhere between 2 and 3.
4.2.1.4 Risk
The risk that would be imposed on the system without this feature is huge because without this
feature, the system would not be able to keep the users without a registered organization away.
Also, this feature is not very hard to implement. The technical and other risk associated with this
feature is not too much. On a relative scale of 1 to 9, the priority for risk is somewhere between 2
and 3.
S. No Stimulus Response
1 User clicks on “Register” button A new screen is opened up and a form is
generated for the user to fill.
22
2 User fills in the required information The entered data is displayed on the screen.
3 User clicks on the “Submit” button The details of the user are stored in the
database and the system will verify that the
name of organization provided by the user
exists. In case of correct credentials, a new
screen appears to the user so that he/she
can access other features. If the
organization is not registered, an error
message prompts and the user is requested
to enter the data again.
23
page
4.3.1.1 Benefit
This feature is very important as it would let the users submit their respective Map reduce jobs to
the application and get the result of their MR Job. On a relative scale of 1 to 9, this feature will be
rated 9 because the priority for benefit is 9.
4.3.1.2 Penalty
This feature is extremely important in terms of penalty because if this feature doesn’t exist there
would be no purpose of the system. On a relative scale of 1 to 9, this feature will be rated 9 because
the priority for penalty is 9.
4.3.1.3 Cost
This feature will contain an access to the server which will check the availability of data in clusters,
run the MR Job on the clusters and reduce the results obtained from the clusters. It will produce the
final result and save it. This feature contains an access to the clusters as well. The cost to implement
this feature is high so the priority for cost is somewhere between 8 and 9.
24
4.3.1.4 Risk
The risk that would be imposed on the system without this feature is huge because without this
feature, the system will fail to serve its purpose. Also, this feature is hard to implement. The
technical and other risk associated with this feature is high. On a relative scale of 1 to 9, the priority
for risk is somewhere between 8 and 9.
S. No Stimulus Response
1 User clicks on “Submit MR Job” button A new screen is opened up and a form is
generated for the user to fill.
2 User fills in the required information The entered data is displayed on the screen.
3 User clicks on the “Submit” button The MR Job and the name of the input file is
sent to the server for further actions. In case
of any missing data or incorrect data, an
error message prompts and the user is
requested to enter the data again.
25
3 REQ-3 Data Type The MR Job field accepts In case of any
only a file and the input file other data type,
name field accepts only an error message
strings. will be generated
4.4.1.1 Benefit
This feature is very important as it would let the users view the results of their submitted Map
reduce jobs to the application. On a relative scale of 1 to 9, this feature will be rated 9 because the
priority for benefit is 9.
4.4.1.2 Penalty
This feature is extremely important in terms of penalty because if this feature doesn’t exist the user
wouldn’t be able to view the result of their submitted Map Reduce jobs and the system would not
fulfill its basic purpose. On a relative scale of 1 to 9, this feature will be rated 9 because the priority
for penalty is 9.
4.4.1.3 Cost
This feature will contain an access to the server to get the final result and view it on the screen. The
cost to implement this feature is not very high so the priority for cost is somewhere between 4 and
5.
4.4.1.4 Risk
The risk that would be imposed on the system without this feature is huge because without this
feature, the system will fail to serve its purpose. Also, this feature is not very hard to implement. The
technical and other risk associated with this feature is not very high. On a relative scale of 1 to 9, the
priority for risk is somewhere between 4 and 5.
26
S. No Stimulus Response
1 User clicks on “View result” button A new screen is opened up and the result is
viewed.
2 User clicks on “Save Result” button The result of the Map Reduce Job is stored
in the database so that the user can view it
later.
3 User clicks on the “Print Result” button The system is connected to the available
printers so that the user is able to print the
result
4.5.1.1 Benefit
This feature is important as it would let the users change the details they provided when they
registered to the application. It lets the users keep their accounts safe by changing their passwords
and provides them the facility of changing the name of organization if the organization’s name is
changed physically. On a relative scale of 1 to 9, this feature will be rated 9 because the priority for
benefit is 8.
4.5.1.2 Penalty
This feature is important in terms of penalty because if this feature doesn’t exist any user would not
be able to change the name of organization if it changed physically and the user will not be able to
27
change the details that have been changed about the organization. On a relative scale of 1 to 9, this
feature will be rated 9 because the priority for penalty is 9.
4.5.1.3 Cost
This feature will contain an access to the database to update the data provided by the user. It will
also cross check that the organization specified by the user is a registered organization or not. The
cost to implement this feature is not very high so the priority for cost is somewhere between 2 and
3.
4.5.1.4 Risk
The risk that would be imposed on the system without this feature is high because without this
feature, the system would not be able to provide a basic facility to the users. Also, this feature is not
very hard to implement. The technical and other risk associated with this feature is not too much.
On a relative scale of 1 to 9, the priority for risk is somewhere between 2 and 3.
S. No Stimulus Response
1 User clicks on “Settings” button A new screen is opened up and different
options are available.
2 User clicks on the “Change Password” A form opens up letting the user change
button his/her password.
3 User provides the old password and then The entered data is displayed on the screen.
sets up a new password.
4 User clicks on the “Submit” button The credentials are updated in the
database.
5 User clicks on the “Edit Details” button A form opens up letting the user change
his/her basic details.
6 User fills in the form by providing the The entered data is displayed on the screen.
details
7 User clicks on the “Submit” button The details of the user are updated in the
database.
28
1 REQ-1 Correct Data The data entered must be If the data is
correct. incorrect, an
error message
prompts and the
user is requested
to enter the data
again
2 REQ-2 Data Type The username, password, If any other data
name of organization, city, type is entered,
email address and address an error message
fields accepts only strings will be generated
and the gender field accepts
only characters.
3 REQ-3 Security The browser should not The page must
cache the change password be treated as
page sensitive data
and the browser
must be told not
to cache the
page
4.6.1.1 Benefit
This feature is very important as it would let the users secure their respective accounts from
unauthenticated users. On a relative scale of 1 to 9, this feature will be rated 9 because the priority
for benefit is 9.
29
4.6.1.2 Penalty
This feature is extremely important in terms of penalty because if this feature doesn’t exist any
malicious user would be able to use a person’s account by getting access to his/her computer system
which will affect the security of the system. On a relative scale of 1 to 9, this feature will be rated 9
because the priority for penalty is 9.
4.6.1.3 Cost
This feature will not contain access to any other module. The cost to implement this feature is not
very high so the priority for cost is somewhere between 2 and 3.
4.6.1.4 Risk
The risk that would be imposed on the system without this feature is huge because without this
feature, the system would not be able to keep the malicious users away from using someone else’s
account. Also, this feature is not very hard to implement. The technical and other risk associated
with this feature is not too much. On a relative scale of 1 to 9, the priority for risk is somewhere
between 2 and 3.
S. No Stimulus Response
1 User clicks on the “Logout” button If the user clicks on the button their session
on the website will terminate. A login
screen will be shown to the user.
4.6.3 Functional Requirements
The functional requirements associated with this feature are as follows:
30
5. Other Nonfunctional Requirements
31
The application shall take initial load time depending on the internet connection strength
which also depends on the media from which the application is running.
The performance of the system shall depend upon hardware components of the user.
Collaborative Hadoop shall be a web based application and has to be run from a web
browser.
Collaborative Hadoop shall be available to the user round-the-clock.
The application shall have an easy to use interface.
The system shall be available in English language.
The application shall be able to validate user actions.
The database connected to the application shall be updated in real-time.
The application shall allow the users to utilize their resources efficiently.
The server shall be able to handle any number of users concurrently.
The user shall be able to log out from any screen.
If the users stays inactive for more than 30 minutes, the system automatically logs them out
of session.
The above mentioned performance requirements apply to all the features of the system, except for
the last mentioned requirement which only applies to the User logout feature.
The application shall ensure that the user is registered with the system before him/her log
in.
The application shall let the user set up a new password if he/she has forgotten the
password.
A backup of the data shall be kept in case of data loss or damage.
The system shall not cause any mishaps.
The application will make sure that the account details of a user stays between the user and
the application.
The client and the server shall communicate over a secure channel.
The application should keep the database that stores the user’s data safe.
Large files shall be split up in to smaller chunks and then sent through a secure channel to
avoid bandwidth throttling.
32
The above mentioned requirements apply to the entire system. In Collaborative Hadoop, a number
of security certificates can be used to satisfy the safety of the application:
Hyper Text Transfer Protocol Secure (HTTPs) indicates that the website is protected by
Secure Socket Layer/Transport Layer Security.
A third-party called a Certificate Authority (CA) to verify that our web application is
authentic.
If the users stays inactive for more than 30 minutes, the system shall automatically log out.
The system shall not leave any cookies on the user’s computer containing the user’s
password.
The application shall generate an email to the registered user in case of three wrong
attempts of the password.
The system shall not leave any cookies on the user’s computer containing any of the user’s
confidential information.
The user’s web browser shall never display his/her password. It shall be displayed as
asterisks on the screen.
The application’s back-end database shall be encrypted.
The above mentioned security requirements apply to the entire system. In Collaborative Hadoop, a
number of security certificates can be used to satisfy the security of the application:
SSL certificates
33
Hyper Text Transfer Protocol Secure (HTTPs) indicates that the website is protected by
Secure Socket Layer/Transport Layer Security.
A third-party called a Certificate Authority (CA) to verify that our web application is
authentic.
5.4.1 Adaptability
Collaborative Hadoop application shall be adaptable to the user’s needs and business’ needs and
requirements and to any future modifications and changes.
5.4.2 Availability
The application shall be available to the users round-the-clock and it should be able to handle
multiple users concurrently. The application will only be available when there is an active internet
connection.
5.4.3 Correctness
The system should be correct and fulfill all the requirements of the users. The application should not
have any defects or errors.
5.4.4 Flexibility
The application should be flexible and should be easily modified with change in time and technology.
The procedure to make changes in the software should not very hard to implement.
5.4.5 Interoperability
The system will be operable using any web browser if the user has an active internet connection.
The application will connect to the database and the clusters to carry out the functionality.
5.4.6 Maintainability
Any software developer with a little or more experience shall be able to fix any defects in the system.
The application shall be very easy to maintain and the maintenance team shall be able to retain the
software effortlessly.
5.4.7 Portability
The application shall run on any web browser in the presence of active internet connection. The
application shall be portable because it is a web application which is platform or operating system
independent.
34
5.4.8 Reliability
The application shall be available to the user day-and-night. The system shall be reliable and will
never crash. The system shall also maintain back up of the database in case of the database failure
and data loss.
5.4.9 Reusability
The modules of the application shall be created in such a way that there is minimal coupling
between them. The modules of the application shall be able to be reused in some other application
with minimum adjustments. The security of the system shall be upgradable.
5.4.10 Robustness
In case that the application cannot connect to the database or any other module, the application
process shall not crash or mutate to an ever loading position and it shall display an error message.
The system shall not crash or terminate in case of bad or invalid input.
5.4.11 Testability
The application shall be designed in a way that each of the modules of the system are testable. All
these modules shall be tested individually and then integrated to create the final application.
5.4.12 Usability
This application shall be usable by anyone with a web browser and an active internet connection.
The system shall provide a uniform look and feel between all the web pages and provide use of icons
and toolbars.
35
The results of the Map Reduce jobs submitted by a user can be viewed by that user only.
6. Other Requirements
The other requirements which were not mentioned in the above sections are as follows:
36
6.2 Legal Requirements
The system must be developed by keeping in mind the legal technological market standards.
Copyright laws and license agreements must be respected for any third party software used in the
creation of this application.
Appendix A: Glossary
37
Bandwidth or device. It is a measurement of how fast data
can be sent over a wired or wireless
connection, measured in bits per second.
Community Enterprise Operating System -
Linux distribution that attempts to provide a
CentOS free, enterprise-class, community-supported
computing platform functionally compatible
with its upstream source, Red Hat Enterprise
Linux.
A cluster consists of a set of loosely or tightly
Cluster connected computers that work together so
that they can be viewed as a single system.
The exclusive legal right, given to an originator
Copyright or an assignee to print, publish, perform, film,
or record literary, artistic, or musical material,
and to authorize others to do the same.
A collection of information that is organized so
Database that it can be easily accessed, managed and
updated.
The qualitative and quantitative techniques and
processes used to enhance productivity and
business gain. Data is extracted and categorized
Data Analytics to identify and analyze behavioral data and
patterns, and techniques vary according to
organizational requirements.
A collection of related sets of information that
Data sets is composed of separate elements but can be
manipulated as a unit by a computer.
Hadoop is an open source, Java-based
programming framework that supports the
Hadoop processing and storage of extremely large data
sets in a distributed computing environment.
Human Computer Interaction - The study of
how people interact with computers and to
HCI what extent computers are or are not
developed for successful interaction with
human beings.
Hyper Text Transfer Protocol - The underlying
protocol used by the World Wide Web and this
HTTP protocol defines how messages are formatted
and transmitted, and what actions Web servers
and browsers should take in response to
various commands.
Internet Protocol - Protocol by which data is
IP sent from one computer to another on
the Internet.
A MapReduce job usually splits the input data-
set into independent chunks which are
processed by the map tasks in a completely
Map Reduce job parallel manner. The framework sorts the
outputs of the maps, which are then input to
38
the reduce tasks.
A free and open source cross
MongoDB platform document oriented database program
which uses JSON-like documents with schemas.
Map reduce - A programming model and an
associated implementation for processing and
MR generating big data sets with
a parallel, distributed algorithm on a cluster.
The data freely available to everyone to use
and republish as they wish, without restrictions
Open Data from copyright, patents or other mechanisms
of control.
The technique of developing a rough sketch of
the interface to get an idea about how the
Prototype output will look like. It also helps in making
future design decisions.
A function that can iterate through the values
Reduce that are obtained from the systems and
produce zero or more outputs.
A session is a semi-permanent interactive
information interchange, also known as a
Session dialogue, a conversation or a meeting, between
two or more communicating devices, or
between a computer and user
Slave Nodes Slave nodes are where Hadoop data is stored
and where data processing takes place
Spring An application framework and inversion of
control container for the Java platform.
Software Requirements Specification – A
SRS document containing the description of
a software system to be developed.
Secure Sockets Layer - A standard security
SSL technology for establishing an encrypted link
between a server and a client - typically a web
server (website) and a browser.
A set of conditions or variables under which a
Test Case tester will determine whether a system under
test satisfies requirements or works correctly.
A Debian-based Linux operating
system for personal
Ubuntu Linux computers, tablets and smartphones, which
also runs on network servers, usually with
the Ubuntu Serveredition or with containers.
Web Browser A software application for retrieving, presenting
and traversing information resources on
the World Wide Web.
39
Appendix B: Analysis Models and Other References
40
Figure 9-Use Case Diagram
Component Diagram
41
Figure 10-Component Diagram
Package Diagram
Object Diagram
42
Figure 12-Object Diagram
Sequence Diagram
43
Figure 13-Sequence Diagram
Communication Diagram
Login
44
Figure 15-Communication Diagram 1
Run MR Job
45
View Result
Deployment Diagram
46
Activity Diagrams
User Login
47
Editing User Details
48
Run MR Job
49
View Results
50
Architecture Diagram
51