You are on page 1of 52

University of Manouba

National School of Computer Sciences

Report of
the Design and Development Project

Subject: Interfacing between SQL DB and


NoSQL DB

Authors :
Asma Mansour Zeineb Saadi
Supervisors :
Dr. Raoudha khcherif
Miss. Hassiba Laifa

Academic Year : 2019 /2020


signature
Abstract

Résumé -Notre projet consiste à développer une interface graphique qui permet d’interroger
les bases de données SQL et NoSQL à travers des requêtes SQL et proposer une approche
de migration de données vers les bases NoSQL.
Nous avons utilisé des pilotes ODBC qui convertissent les requêtes SQL en commandes
susceptibles d’être interprétées par un SGBD. Nous nous interessons dans ce projet à in-
terroger les bases de données MongoDB et MySQL, et Extraire, charger et transformer des
données depuis des fichiers CSV vers MongoDB.
Mots clés: NoSQL, MongoDB, ODBC.

abstract— Our project is to develop a GUI that allows to perform SQL queries to multi-
ple databases and to migrate data from different data sources to NoSQL. We used ODBC
drivers that convert SQL queries into commands that can be interpreted by a DBMS.
Our work was essentially to seek a suitable way to get data from MongoDB and MySQL
databases using the same SQL statements, and extract, load and transform data from CSV
files into MongoDB.
Keywords: NoSQL, MongoDB, ODBC

2
Acknowledgement

The success and final outcome of this project required a lot of guidance and assistance
from many people and we are extremely privileged to have got this all along the completion
of our project. All that we have done is only due to such supervision and assistance and
we would not forget to thank them.

First and foremost, we would like to extend our deepest gratitude to our supervisors,
Raoudha Khcherif and Hassiba Laifa , who extended their complete support and
helped to make us deliver our best.

We would also like to thank all our teachers at the National School of Computer Sciences
for their constant encouragement, support and guidance which helped us in successfully
completing our project work.

Finally, special thanks to the jury members who honored us by examining and evaluating
this modest contribution.

3
Contents

Introduction 9

1 Preliminary Study 11
1.1 Academic study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.1 Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.2 NoSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.1.3 MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2 Study Of The existing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.1 the SOS platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.2 Studio 3T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.3 Criticism and proposed Solutions . . . . . . . . . . . . . . . . . . . 17
1.3 State of the art technologies . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.1 ODBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.2 ODBC driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Requirements Analysis and Specification 20


2.1 Requirements Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.1 Actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.2 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.3 Non-Functional Requirements . . . . . . . . . . . . . . . . . . . . . 21
2.2 Requirements Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.2 Global Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Design 25
3.1 Global Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 Physical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.2 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.3 Combined component / deployment diagram . . . . . . . . . . . . . 27
3.2 Detailed design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4
CONTENTS

3.2.1 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28


3.2.2 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4 Achievement 34
4.1 Developing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.1 Hardware Environment . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.2 Software Environment . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.3 Technological choices . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Achieved Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.1 The authentication interface . . . . . . . . . . . . . . . . . . . . . 37
4.2.2 The query interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.3 The Data interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.4 Load CSV file interface . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Evaluation and Recommendation 47

CONTENTS 5
List of Figures

1.1 Big Data definition [SKM15] . . . . . . . . . . . . . . . . . . . . . . . . . . 12


1.2 NoSQL types[Qut19] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 CAP triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Architecture of SOS [PR] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 ODBC model components . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22


2.2 Global Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1 3-tier Architecture Diagram [Wik17] . . . . . . . . . . . . . . . . . . . . . 26


3.2 MVVM Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Combined component / deployment diagram . . . . . . . . . . . . . . . . . 27
3.4 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Connection Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6 UpdateDB Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1 The authentication interface . . . . . . . . . . . . . . . . . . . . . . . . . . 37


4.2 The users table in the SQLite DB . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 The home interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 The queries table in the SQLite DB . . . . . . . . . . . . . . . . . . . . . 40
4.5 Employees collection view after INSERT query statement . . . . . . . . . . 40
4.6 Employees collection view after UPDATE query statement . . . . . . . . . 41
4.7 Employees collection view after DELETE query statement . . . . . . . . . 41
4.8 Tthe collections view in the New York’s MongoDB database . . . . . . . . 42
4.9 the employees collection view . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.10 the DataFrame resulting from SELECT query . . . . . . . . . . . . . . . . 43
4.11 Statistics and histogram chart . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.12 the DataFrame resulting from the INNER JOIN clause . . . . . . . . . . . 44
4.13 the DataFrame resulting from the FULL JOIN clause . . . . . . . . . . . . 44
4.14 Dialog file to load CSV file into MongoDB . . . . . . . . . . . . . . . . . . 45
4.15 CSV file rows that were added to the collection. . . . . . . . . . . . . . . 45
4.16 The employees collection before Loading CSV file . . . . . . . . . . . . . . 46
4.17 The employees collection after Loading CSV file . . . . . . . . . . . . . . . 46

6
List of Tables

1.1 Complementary Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14


1.2 Criticism of the existed solutions . . . . . . . . . . . . . . . . . . . . . . . 17

4.1 Characteristics of the Used Computers . . . . . . . . . . . . . . . . . . . . 34

7
Acronym

MVVM : Model-View-ViewModel
CSV : Comma Separated Value
NoSQL : Not only SQL
ORM : Object-Relational Mapping
DB : Database
GUI : Graphical User Interface
UML : Unified Modeling Language
Pyro4 : Python Remote Objects
ODBC : Open Database Connectivity
DSN : Data Source Name
DBMS : Database Management System
RDBMS : Relational Database Management System
JSON : JavaScript Object Notation
DBMS : Data Base Management System

8
Introduction

For the past thirty years, the Relational Database has been the default choice for most
traditional data-intensive storage and retrieval applications [Cha10], with Structured Query
Language (SQL) as the standard language designed to perform the basic data operations.

However,with the emergence of big data, which consist of massive volume and variety in
both type and structure of data, the strictly structured schema-based relational database
lose efficiency. Furthermore,in regards of the scalability , relational databases are vertically
scalable which means that they need to add more resources such as RAM, SSD or CPU
to be able to increase the load on a single server,and which was very difficult and costly
task.[Lak]

The NoSQL (not-only SQL) databases appear with a set of new data management fea-
tures, as alternatives to overcome the limitations of the current relational databases. In
fact, NoSQL databases are horizontally scalable, which means that they can handle massive
amounts of data by adding more servers sharing the workload, also it is a perfect fit when
it comes to big data as its schema-less architecture enables it to adapt well to the rapidly
increasing and changing structured, semi -structured and unstructured data.[Gre18]

However, as most developers used to write SQL statements to get data, dealing with
NoSQL databases that don’t have a declarative query language wasn’t that easy.

In this context, our project consists in developing a graphical user interface (GUI), that
allows users to query relational and non-relational databases with the same SQL state-
ments, extract, transform and load data from any data sources into NoSQL databases, and
display the results of the queries in tables as if we were dealing with relational databases.
And since this whole project is very large compared to the demands of the development
and design project in terms of time and quantity we had the honor to take care of a part
which is: using SQL to query MongoDB and MySQL, and extract, transform and load
data from CSV files into MongoDB (Data warehouse).

In the first part, we will present the project context by giving a scope, a summary of the
problem, and, finally, the result that we expect to have. In the second we are going to go
deeper and explain what does already exist what is their gaps and weakness then we will

9
LIST OF TABLES

explain the major concepts that help us to build our project to be able to understand what
we are going to present in the upcoming section. the third section is dedicated to showing
the specification and requirement analysis. And in the fourth one, we will give diagrams
that give an overview of each part realized.

The last section presents what we made during the previous four months we gave screen-
shots of the codes, statistics and the web sits view to illustrate what we achieved.

Finally, we conclude our work by giving evaluations and suggesting prospects for the
application.

LIST OF TABLES 10
Chapter 1

Preliminary Study

This chapter contains an academic study that introduces the most significant concepts
discussed in our project. We then study some of the existing solutions and critique their
effectiveness and their limitations. Next, we will specify our proposed solution and our
work methodology.

1.1 Academic study


We devote this section to introduce some useful concepts and theoretical knowledge in
order to facilitate the understanding of the upcoming work.

1.1.1 Big Data


Definition
It’s important to have some historical context to better recognize big data. Here is
Gartner’s definition, circa 2001 (which is still the go-to definition): Big data is data that
contains greater variety arriving in increasing volumes and with ever-higher velocity. This
is known as the three Vs.[BL12]

Big data Characteristics


The "volume, velocity, and variety" of the three Vs, the concept of big data originally
coined by Doug Laney in 2001, has been in place for a few years to describe big data. It
basically interpreted big data as being a lot of data that is in a scattered form and needs
to be processed quickly for proper analysis and interpretation.[BL12]

In August, 2013 the definition was further enhanced to include, "veracity, variability,
visualization, and value" ,as the figure 1.1 shows, which gave a newer vision to it.[Rij13]

11
1.1. ACADEMIC STUDY

Figure 1.1: Big Data definition [SKM15]

Big Data mainly involve six aspects as per the mentioned definition :

• Volume : defines the quantity of Big Data, which ranges from terabytes and petabytes,
to even Exabytes.

• Variety : defines data types of Big Data, which includes structured and unstructured
data such as audio, video, text, posts, log files and many more.

• Velocity : is the speed at which the data is created, stored, analysed and visualised.

• Veracity : Refers to the requirement of correct form of data.

• Variability : Data can have the same form but different semantics.

• Visualization : is the manner how to make Data easy to understand and read.

1.1.2 NoSQL
Definition
Since the 1970s, the relational database has been the essential reference for managing the
data of an information system. However, faced with the 3V (Volume, Velocity, Variety),
relational can hardly fight against this wave of data. The NoSQL has naturally imposed
itself in this context by proposing a new way to manage data, without relying on the
relational paradigm, hence the "Not Only SQL". This approach proposes to relax some
heavy constraints of the relational to promote distribution (data structure, query language
or consistency).[BT19]

CHAPTER 1. PRELIMINARY STUDY 12


1.1. ACADEMIC STUDY

NoSQL DB types
The storage and handling requirements within a database are variable and depend mainly
on the application you wish to integrate. For this, different families of NoSQL databases
exist as the figure 1.2 represents :

• Document DB : store data in documents similar to JSON objects. Each one of them
contains pairs of fields and values.

• Key-value DB : each unique identifier is stored as a key with its associated value that
can be any sort of byte array, data structure, or even binary large object.

• Wide-column DB : stores data in tables, rows, and dynamic columns.Each row is not
required to have the same attributes type.

• Graph DB : stores data in nodes and edges. Nodes store information about objects
while edges store information about the relationships between these nodes.

Figure 1.2: NoSQL types[Qut19]

ACID vs BASE
When we say SQL, we mean by it the ACID properties for transactions:

• Atomicity : A transaction takes place entirely or not at all.

• Consistency : The DB must be consistent at the beginning and end of a transaction.

CHAPTER 1. PRELIMINARY STUDY 13


1.1. ACADEMIC STUDY

• Isolation : Changes to a transaction are only visible or modifiable once it has been
entirely finished.

• Durability : Once the transaction has been released, the status of the DB is perma-
nent.[Sas18]
However, these properties are not applicable in a distributed context such as NoSQL.As
a result, BASE properties have been proposed to characterize NoSQL DBs :
• Basically Available : the DB appears to work most of the time.

• Soft-state : the DB does not need to be consistent at all times.

• Eventually consistent : eventually, the DB will reach a consistent state.

1.1.3 MongoDB
Definition
A record in MongoDB is a document, which is a data structure composed of field and
value pairs. MongoDB documents are similar to JSON objects. The values of fields may
include other documents, arrays, and arrays of documents [ope].

The figure 1.1 shows the difference between the SQL and MongoDB terms :

SQL terms MongoDB terms


1 Database Database
2 Table Collection
3 Entity/Row Document
4 column key/filed
5 Table join Embedded Documents
6 Primary Key Primary Key (Default key _id provided by mongoDB itself)

Table 1.1: Complementary Terms

MongoDB features
As a NoSQL tool,mongoDB becomes one of the most popular databases because of its
key features, including the query language, which makes it the strong point.
• High Performance

• Rich Query Language

• High Availability

CHAPTER 1. PRELIMINARY STUDY 14


1.1. ACADEMIC STUDY

• Replication

• Duplication of data

• Supports map reduce and aggregation tools

CAP theorem
In 2000, Eric A. Brewer[Bre00] formalized a particularly interesting theorem called the
CAP Theorem which is based on 3 fundamental properties to characterize databases (re-
lational, NoSQL and others) :

• Consistency : A data has only one visible state regardless of the number of replicas.

• Availability : As long as the system is running, the data must be available.

• Partition Tolerance(distribution) : Regardless of the number of servers, all queries


must provide a correct result.

Theorem (CAP theorem). In any database, you can respect at most 2 properties among
consistency, availability and distribution.

Thanks to this CAP Theorem, it is then possible to classify all the databases by placing
them on the "CAP triangle"1.3.

Figure 1.3: CAP triangle

CHAPTER 1. PRELIMINARY STUDY 15


1.2. STUDY OF THE EXISTING

1.2 Study Of The existing


We devote this section to introduce the applications we found around, with a similar
purpose to ours, and as we compare them, we point out what they lack, and justify the
need for the features of our project.

1.2.1 the SOS platform


After many researches, we can say that SOS is the first proposal platform that aims to
provide support to the management of heterogeneity of NoSQL databases.

The approach for SOS, we describe here, uses the meta-layer as the principal means to
support the heterogeneity of different data models.The pivot model, the meta-layer, is used
to point out a common interface.The idea of a pivot model finds its basis in the MIDST
and MIDST-RT tools.[PR]

The figure 1.4 below represents the logical architecture of the SOS platform:

Figure 1.4: Architecture of SOS [PR]

CHAPTER 1. PRELIMINARY STUDY 16


1.2. STUDY OF THE EXISTING

1.2.2 Studio 3T
Studio 3T (formerly MongoChef) is a GUI and integrated development environment
(IDE) for developing and managing data on the MongoDB NoSQL platform. It is developed
by 3T Software Labs (acquired by Redgate Software in 2016).

The queries are constructed by drag-and-dropping fields and operators into a visual
query builder; using automated code completion with an aggregation pipeline builder;
map reduce; or writing standard SQL queries, with auto-translation into mongoshell JSON
script.

1.2.3 Criticism and proposed Solutions


The table 1.2 resumes the comparison between the SOS platform and Studio 3T. The
comparison is based on three axes which are chosen for many reasons:

• High performance
This module is essential in any project due to the importance of response time. One
of the most obvious issues in the SOS platform is that it has a low performance as the
article[PR]said that the homogenization presents performance tradeoffs that need to
be well and carefully studied.

• Connect to different DBs


The homogenization needs essential the connection to different DBs. This character-
istic in the Sos platform is provided by the DBs handlers.

• Restricted to any type of queries


Having the authority of querying many types of query gives the app a strong charac-
teristic to reinforce its chance to succeed compared to other competitors. As known,
Studio 3T has only the select query in its package.

High performance Connect to different DBs Restricted to any type of queries


SOS platform NO YES YES
Studio 3T YES NO NO

Table 1.2: Criticism of the existed solutions

We observe clearly that the low performance of the SOS platform is due to the compli-
cated approach which is based on Model- independent schema translation.SO we need to
translate every schema of every DB even that we didn’t need it that maybe takes more
time to respond.

CHAPTER 1. PRELIMINARY STUDY 17


1.3. STATE OF THE ART TECHNOLOGIES

On the other hand, the studio 3T doesn’t afford a connection with other DBs except for
MongoDB and also it can not do the update SQL queries.

Here, we intervene with our project to offer a solution providing a new approach that
does not require a pivotal model with no translation from the base to this model. And for
these reasons, the proposed approach promises more significant performances.

1.3 State of the art technologies


Here we present and define the different theoretical tools needed to give birth to our
project.

1.3.1 ODBC
Open Database Connectivity (ODBC) is Microsoft’s basic application programming in-
terface (API) for accessing database management systems (DBMS) using SQL as a data
access basic. ODBC provides full interoperability, making it possible for a single applica-
tion to access multiple DBMS, and Users can then add ODBC database drivers to connect
the application to their DBMS option. [Mic]

1.3.2 ODBC driver


An ODBC driver uses the Open Database Connectivity (ODBC) interface by Microsoft
that allows applications to access data in database management systems (DBMS) using
SQL as a standard language for accessing the data.

The ODBC driver interface defines:

• A library of ODBC function calls.

• SQL syntax language.

• A standard way to connect and logon to a DBMS.

• A standard set of error codes.

1.4 Proposed solution


In our project, we propose a comprehensive method using the ODBC API with an ODBC
driver appropriate for each driver.

CHAPTER 1. PRELIMINARY STUDY 18


1.5. CONCLUSION

Figure 1.5: ODBC model components

Our first step was installing the ODBC driver and configurating the DSN then we inte-
grate the URL connection with the appropriate DSN pieces of information into the code
to bind both of the GUI and the DBMS.

This figure indicates an ODBC application which has two different databases available.
Using the ODBC API, an ODBC program makes a call to the Driver Manager. The Driver
Manager may be either the Microsoft Driver Manager or the unixODBC Driver Manager.
The Driver Manager also makes a call to the ODBC Driver when using the ODBC API.
Using a network communication connection the ODBC Driver accesses the database.

1.5 Conclusion
In this chapter, we gave a brief study about the most significant terms(Big Data, NoSQL,
MongoDB) by presenting their definitions and some of their features. After that, we studied
the existing solutions and we highlighted our proposed solution. In the next chapter, we
explain the different functionalities that our GUI offers and we present the main possible
scenarios.

CHAPTER 1. PRELIMINARY STUDY 19


Chapter 2

Requirements Analysis and


Specification

This chapter will set out a detailed requirements review and specification. By identifying
the principal actors in direct contact with the program, it will describe the key functionali-
ties in the first section. Then we’ll identify the functional and non-functional requirements.
Finally, the biggest part representing the heart of this chapter will describe the main use
case using UML accompanying diagrams.

2.1 Requirements Analysis


Throughout this section, we define the actors and present the various functional and
non-functional specifications that should satisfy our application.

2.1.1 Actors
Our application has a single user. The user connects first then he/she can query or add
a CSV file into the MongoDB.

2.1.2 Functional Requirements


Functional requirements relate to the primary functionalities that must be fulfilled by our
interface once created. Our Gui should provide the user with the following functionalities:

• Authentificate

• Add CSV files to MongoDB

• Query both SQL DB and MongoDB

• View data following to a select query

20
2.2. REQUIREMENTS SPECIFICATION

2.1.3 Non-Functional Requirements


Our system must achieve these considerations :

• Usability : the GUI must be clear and easy to understand.

• Availability : Our system must be always available for the user.

• Portability : the GUI can work on any operating system(Windows, Linux, ...).

• Robust security : all passwords and queries are encrypted in the SQLite DB.

2.2 Requirements Specification


In this section we offer a better understanding of the mentioned requirements by declaring
them in a semi-formal way. For that, we use the Use Case Diagram and the Global Sequence
Diagrams.

2.2.1 Use Case Diagram


A use Case Diagram UCD is a simple representation of the actor’s (in our case, we have
a single user) interactions with the system, it gives a simple view about the attitude of the
system and his cooperation with the user.

The figure 2.1 represents the use case diagram of our GUI.
First of all, the user has to fill the connection form and submit it. Then, if the authen-
tification was validated, he/she can either add a CSV file into the MongoDB or query the
DBs. In this case, if we suppose that the query is select, here the GUI offers to the user
the opportunity to see data in the form of Data Frame or Histogram or either statistics.

CHAPTER 2. REQUIREMENTS ANALYSIS AND SPECIFICATION 21


2.2. REQUIREMENTS SPECIFICATION

Figure 2.1: Use Case Diagram

2.2.2 Global Sequence Diagram


A Sequence Diagram represents the interaction between user and system arranged in
time sequence. It also represents users and system communication it can be either by
methods calls or actions performed by the users.

In the figure 2.2, we present the interaction between the user, the system(GUI), SQL
DB, and MongoDB.

CHAPTER 2. REQUIREMENTS ANALYSIS AND SPECIFICATION 22


2.2. REQUIREMENTS SPECIFICATION

Figure 2.2: Global Sequence Diagram

CHAPTER 2. REQUIREMENTS ANALYSIS AND SPECIFICATION 23


2.3. CONCLUSION

The first user step is the click on the connection button after filling the user name and
the password forms. Then, the GUI sends a connection request for both SQL DB and
MongoDB. The response will be either true or an error. If the connection was established
correctly, a home view will appear for the user providing him the opportunity to query or
add a CSV file and the user information will store in the interface DB. But, if a connection
error comes as a result to the GUI, an error message will be shown to the user.

2.3 Conclusion
During this chapter, we defined and evaluated the functional and non-functional re-
quirements that our project would follow, and discussed our system’s key use cases and
scenarios. In the next chapter, by presenting their concept, we go one step further in the
development of our GUI.

CHAPTER 2. REQUIREMENTS ANALYSIS AND SPECIFICATION 24


Chapter 3

Design

The design phase is one of the most important axes for any project’s development and
execution, as it is the guideline for the implementation process. Taking that into account,
we devote this chapter to detailing the patterns and designs that are to be followed in the
next step of the application development. We must first decide the architecture in physical
and logical terms. After that, we’ll detail and justify the design choices using the necessary
UML diagrams.

3.1 Global Design


Throughout this section, we are briefly reviewing the proposed architecture to ensure it
is in line with our application’s objectives.

3.1.1 Physical Architecture


The physical architecture is the physical structure of a device which defines all of its
components that support hardware.
For this, we select the 3-tier physical architecture as the most suitable for our project.

This type of multi-tiered model is divided into three layers as the figure 3.1 shows :

• The Presentation tier : this is the layer at which the user views of the GUI are
mounted. It is accessible from the user’s computer where the Interface is installed.
This layer is considered as the highest layer and it is well-linked with the application
layer that is located just below.

• The Application tier : it is the intermediate layer where the correspondence is


manifested between the other layers, it is defined as the engine of the whole applica-
tion. All the analysis required to guarantee a result is done in this layer.

25
3.1. GLOBAL DESIGN

Figure 3.1: 3-tier Architecture Diagram [Wik17]

• The Database tier : this is the third layer comprising the network of three levels:
it corresponds to the server of the database. On this third tiers, a DBMS (Database
Management System) is installed, in our case Microsoft SQL Server and Mongo
Server, and these servers are requested by the server application to get a certain
amount of data.

3.1.2 Logical Architecture


The MVVM is a development architecture designed to separate the source code into
modules.

The Model-View-ViewModel (MVVM) architectural design pattern splits your applica-


tion into three levels as the figure 3.2 represents :
• Model : defines the data and business logic.
• View : Specifies the User interface including all visual elements (buttons, icons,
editors, etc.) that are connected to the ViewModel properties and commands.
• ViewModel : Contains the logic that ties Model and View.

The choice of MVVM is due to the issues coming up from the strong coupling between
the view and the model in the MVC architecture.
In fact, in the MVVM architecture, There is no more direct association between the two
layers the view and the model and here it comes the principal role for the ViewModel layer
which is the middleware associating the two other layers.
The Data flow between the ViewModel and the View is organized by a binder. This binder
is made with the Observer pattern of MVC which is reintroduced but at the level of the
attributes.

CHAPTER 3. DESIGN 26
3.2. DETAILED DESIGN

Figure 3.2: MVVM Diagram

3.1.3 Combined component / deployment diagram


Component diagrams are class diagram based conceptual analogs and on the other hand
Deployment diagrams display the actual software and hardware configurations.
The Combined component/deployment diagram illustrates the physical architecture of
the system and sheds the light on the interactions between its different components as the
figure3.3 shows.

Figure 3.3: Combined component / deployment diagram

3.2 Detailed design


In this section, by unveiling the class diagrams and then the sequence diagram we move
on to the detailed design of our project.

CHAPTER 3. DESIGN 27
3.2. DETAILED DESIGN

3.2.1 Class Diagram


Class Diagram is a static diagram. It reflects an application’s Static View. The class
diagram is used not only to represent but also explain and document various aspects of a
program but also to create an executable software application code.

The figure 3.4 represents the class diagram combined with the three layers of the MVVM
architecture.

• The View Layer


In this layer, we have three views. The first view is the Connection View where the
user writes the user name and the password. The second view is the home view where
the user can write the query and submit it. The file view is the interface where we
can add a CSV file into the MongoDB.

• The ViewModel Layer


In the package of ViewModel, there are two classes. The first one is Connection that
is responsible for applying the connection between the GUI w and the DBs managing
by the DataManagement for more correctness, organization, and security.

• The Model View Layer


At this level, the two models in our project are the user binding with the query as
objects, the User Id as a Foreign key in the Query table to bind the two tables, Using
the ORM.

CHAPTER 3. DESIGN 28
3.2. DETAILED DESIGN

Figure 3.4: Class Diagram

CHAPTER 3. DESIGN 29
3.2. DETAILED DESIGN

3.2.2 Sequence Diagram


The sequence diagram is a dynamic one. It displays interactions between objects orga-
nized in time series.

Connection Sequence Diagram

Figure 3.5: Connection Sequence Diagram

CHAPTER 3. DESIGN 30
3.2. DETAILED DESIGN

The first step of the user, as the figure 3.5 shows, is when he clicks on the connection
button after filling the connection form. That event issues the creation of a new object,
the connection object, which is the middleware between the GUI and the controller Data-
Management.

If the couple username and password exist in the SQLite DB, the connection is applied
using the Driver. If the UserName exists in the SQLite DB, but the password is wrong, an
error message will be displayed on the GUI.

The other case is when the name doesn’t exist in the interface DB. So, here It’s the first
time when the user connects by this interface or a modification was occured on the old
password and the user was deleted from the SQLite DB. Then, a request for connection
will be sent by the connection object to the driver.

If the response is True, the inputs written by the user are correct. Else, an error message
will be displayed on the GUI.

CHAPTER 3. DESIGN 31
3.2. DETAILED DESIGN

UpdateDB Sequence Diagram

Figure 3.6: UpdateDB Sequence Diagram

The figure 3.6 describes the manner of the regular update in the SQLite DB. The update
will be in a continuous cycle with the aid of a while True instruction. Every single Data
in the User or Query table will be sent as a query to the MongoDB and SQL DB.

CHAPTER 3. DESIGN 32
3.3. CONCLUSION

3.3 Conclusion
Through this chapter, we presented the general architecture of our GUI and we explained
our choices. For the detailed design, we exhibited the class diagram as well as the sequence
diagrams. In the next chapter, we focus on the implementation phase and we expose the
technologies employed during the achievement of the project.

CHAPTER 3. DESIGN 33
Chapter 4

Achievement

This chapter is dedicated to presenting what we have accomplished, We begin by delin-


eating the hardware and software framework used in software implementation. Then we
present technologies that are used to execute our program and the reasons for choosing
them. Finally, we’ll clarify our project’s different phases and include some screenshots of
the progress that has been accomplished.

4.1 Developing Environment


This section is consecrated to present the hardware and software environment used for
the realization of this GUI.

4.1.1 Hardware Environment


While developing this application we used our machines that does not require any special
needs. The table 4.1 below represents the characteristics of our machines :

Laptop HP DELL
Processor 1.8 GHZ 2.9 GHZ
RAM 8 GB 8 GB
Hard Drive 1 TB 1 TB

Table 4.1: Characteristics of the Used Computers

34
4.1. DEVELOPING ENVIRONMENT

4.1.2 Software Environment


In this part, we list the different software products we used throughout the development
of our GUI.

Jupyter Notebook
The Jupyter Notebook is an open-source web application that allows you to create and
share documents that contain live code, equations, visualizations and narrative text[Jup].
This application was chosen in order to facilitate the test of algorithms before integrating
them into our project.

Spyder
Spyder is a powerful scientific environment written in Python, for Python, and designed
by and for scientists, engineers, and data analysts. It is an IDE that can offer the possibility
to merge the editing, analyzing, debugging, and profiling functionality of a comprehensive
development tool with the data exploration.

Qt Designer
Qt Designer is the Qt framework used to design and build graphical user interfaces
(GUIs) with Qt Widgets. At compile time they will be automatically converted back to
python and thus usable as normal classes.

4.1.3 Technological choices


In this part, we will present the programming language and the libraries used to imple-
ment our GUI and the reasons for choosing these special technologies.

Programming language
Python is an interpreted and high-level programming language. Object-oriented pro-
gramming and structured programming are fully supported, and many of its features sup-
port functional programming and aspect-oriented programming. Python was our choice to
implement our application because of :
• Productivity
Python provides advanced support for image and voice data because of its built-in
data features of supporting data processing for unstructured and unconventional data
which is a common need in big data when analyzing social media data. It has also
all the capabilities to develop complex multi-protocol network applications.
• Powerful Scientific Packages
Python library packages, including ODBC and Pandas, fulfill analytical and Data
processing needs.

CHAPTER 4. ACHIEVEMENT 35
4.1. DEVELOPING ENVIRONMENT

• Scalability
The scalability is a process, network , software, or organisation’s ability to grow
and manage increased demand. Unlike other languages of data science, such as
R, MatLab, or Stata, Python is much quicker. This makes python is the most
appropriate when we talk about NoSQL [Ver18].

Libraries
• Pandas
Pandas is a library that helps in analyzing data. Besides, it provides the required data
structure and operations on time series and numerical tables for data manipulation.
We use it to visualize data in many forms, such as tables or statistical diagrams or
even histograms.

• SQLAlchemy
SQLAlchemy is the Python SQL toolkit and Object Relational Mapper ORM which
gives the full power and accessibility of SQL to the application developers.[SQL]
It was the mapper between our SQLite DB and the models: the User and the Query.

• Pyodbc
pyodbc is an open-source Python module, which simplifies access to ODBC databases.
In our GUI, this module makes the access to ODBC of both SQL DB and MongoDB
easier using the drivers appropriate for each DB.

• CSV
The CSV module implements Classes in CSV format for reading and writing tabular
data. This library provides the possibility to extract data from any CSV file to insert
it into MongoDB.

• JSON
Python comes with a built-in package to encode and decode JSON data, called json.
We use this module to transform the data in the CSV file to the JSON format.

• Cryptography
Cryptography is a package that provides Python developers with cryptographic recipes
and primitives.We used it to crypt and decrypt the data in the SQLite DB.

CHAPTER 4. ACHIEVEMENT 36
4.2. ACHIEVED WORK

4.2 Achieved Work


In this section, we present the final state of our GUI.We chose to work with a database
describing some of the restaurants in New York city, in both MongoDB and MySQL DB,
with collections and tables named restaurants and employees .We will show views of the
interface one at a time.

4.2.1 The authentication interface


The authentication interface provides the possibility to the user to fulfill the username
and the password, as the figure 4.1 shows, to create a session for both MySQL DB and
MongoDB.

Figure 4.1: The authentication interface

The figure 4.2 shows the state of the users table after every connection operation. The
name and the password are crypted using the Python module Cryptography to provide
more security to our application.

CHAPTER 4. ACHIEVEMENT 37
4.2. ACHIEVED WORK

Figure 4.2: The users table in the SQLite DB

4.2.2 The query interface


When the connection is realized with success, the home interface (figure 4.3) displays on
the screen to give the user the right to query the DBs connected to the whole system. If
the user prefers to insert into the MongoDB a CSV file, one-click one add and choose the
file appropriate are enough.
To gain robust security, data must be encrypted in any DB as the figure 4.5 shows.

CHAPTER 4. ACHIEVEMENT 38
4.2. ACHIEVED WORK

Figure 4.3: The home interface

Update queries
through the GUI we can use UPDATE statements such as INSERT, UPDATE, DELETE,
to modify data from current MongoDB collections. the figure 4.5 shows the changes that
have been made to the employees collection after executing Insert statement.

We can also use the UPDATE query to modify data values. The figure 4.6 below shows
the effect of the Update query statement on the employees collection.

CHAPTER 4. ACHIEVEMENT 39
4.2. ACHIEVED WORK

Figure 4.4: The queries table in the SQLite DB

Figure 4.5: Employees collection view after INSERT query statement

We can even remove data from a collection or delete the entire collection. The figures
4.7 and 4.8 display respectively the employees collection after removing the selected data
and the collections in the New york’s MongoDB database, after deleting a collection.

CHAPTER 4. ACHIEVEMENT 40
4.2. ACHIEVED WORK

Figure 4.6: Employees collection view after UPDATE query statement

Figure 4.7: Employees collection view after DELETE query statement

4.2.3 The Data interface


In the case of a select queries, the GUI provides a data visualization in many forms using
Pandas. The GUI gives a view of the database names , the tables and collections and the
results of the queries,for both MongoDB and MySQL DB, are displayed in DataFrames
(rows and columns).The GUI also provides data visualization such as histogram charts and
statistics.

The simple SELECT query


We used this simple SELECT query "SELECT * FROM employees" to view the data
located in the employees table/collection.The figure 4.9 shows a view of the employees
collection from the MongoDB, while the figures 4.10 and 4.11 display respectively the

CHAPTER 4. ACHIEVEMENT 41
4.2. ACHIEVED WORK

Figure 4.8: Tthe collections view in the New York’s MongoDB database

DataFrame resulting from the above query, the data statistics and histogram chart .

Figure 4.9: the employees collection view

CHAPTER 4. ACHIEVEMENT 42
4.2. ACHIEVED WORK

Figure 4.10: the DataFrame resulting from SELECT query

Figure 4.11: Statistics and histogram chart

CHAPTER 4. ACHIEVEMENT 43
4.2. ACHIEVED WORK

The SELECT query with the JOIN clause


The MongoDB, like the other NoSQL databases, does not allow joins. Within this sub-
section we’ll show you how the GUI manages the JOIN clauses. The figures below show the
DataFrame resulting from the INNER JOIN clause (figure 4.12) and the full Join clause.
(figure 4.13)

Figure 4.12: the DataFrame resulting from the INNER JOIN clause

Figure 4.13: the DataFrame resulting from the FULL JOIN clause

CHAPTER 4. ACHIEVEMENT 44
4.2. ACHIEVED WORK

4.2.4 Load CSV file interface


The GUI provides a way to create a new collection in MongoDB or to add, simultane-
ously, several records to an existing one by simply loading a CSV file into MongoDB.The
figures 4.14 and 4.15 show respectively the load CSV file into MongoDB collection View
and the CSV file rows that were added to the collection.

Figure 4.14: Dialog file to load CSV file into MongoDB

Figure 4.15: CSV file rows that were added to the collection.

CHAPTER 4. ACHIEVEMENT 45
4.3. CONCLUSION

The figures 4.16 and 4.17 below show the changes made after adding the CSV file in the
employees collection.

Figure 4.16: The employees collection before Loading CSV file

Figure 4.17: The employees collection after Loading CSV file

4.3 Conclusion
During this chapter, we presented the technologies chosen to implement our project and
we gave the reason for that choice. Then, we finished by showing the main interfaces of
our application and the related tables of SQLite DB.

CHAPTER 4. ACHIEVEMENT 46
Evaluation and Recommendation

Evaluation
We were assigned the responsibility through this project to develop a graphical user
interface GUI to allow users to query multiple databases SQL and NoSQL despite their
structure and types within the same SQL query language.

Considering the time allocated to this project, we took particular interest in querying
MongoDB and MySQL database , displaying the result of the SELECT SQL queries in
the form of rows and columns and edit MongoDB databases whether by using the update
queries or by loading CSV files.

Firstly, the section of the preliminary study presented initial research in various theoret-
ical concepts, starting with introducing the most significant terms used in our project,
finishing with using those techniques to drift from the idea of a simple GUI dedicated to
querying only relational databases with SQL statements to a more sophisticated one ca-
pable of communicating simultaneously with non-relational and relational databases using
the same language.

Secondly, we presented a static and dynamic overview of the different aspects of the appli-
cation.

Then, we experimented with multiple SQL queries on our Interface to determine how Mon-
goDB and MySQL will respond face-to-face with the same standardized query language.

Finally, We exhibited the fruit of our researches with a GUI capable of querying both
MongoDB and MySQL databases and display data in its standard form (DataFrames).

Perspective
Thanks to the evolutive character of the Big Data domain, our GUI can support other
functionalities. In fact, the main project is to develop a GUI capable to query not just
MongoDB and MySQL DBs but any relational and non-relational DBs, thus more experi-

47
4.3. CONCLUSION

ence in dealing with different database technologies.

Furthermore, we can work to extract and transform structured and unstructured data from
different data sources and load it into a common data repository (Data Warehouse), thus
requires a complex ETL process.

Once completed, the project can be extended to suit the Business Intelligence Requirements
and other application fields.

CHAPTER 4. ACHIEVEMENT 48
Bibliography

[BL12] Mark Beyer and Douglas Laney. “The Importance of Big Data”. In: (2012).
[Bre00] Dr. Eric A. Brewer. “Towards Robust Towards Robust Distributed Systems”.
In: (2000).
[PR] Francesca Bugiotti Paolo Atzeni and Luca Rossi. “Uniform access to non-relational
database systems: the SOS platform”. In: Dipartimento di informatica e au-
tomazione Universit‘a Roma Tre ().
[SKM15] Soumya Shukla, Vaishnavi Kukade, and Sofiya Mujawar. “Big Data: Concept,
Handling and Challenges: An Overview”. In: International Journal of Computer
Applications (0975 – 8887) (Mar. 2015), p. 114.

49
Netography

[BT19] Régis Behmo and Nicolas Travers. Nicolas Travers. https://openclassrooms.


com / fr / courses / 4462426 - maitrisez - les - bases - de - donnees - nosql /
4462471-maitrisez-le-theoreme-de-cap. 2019.
[Cha10] Zhendong Zhao Chad Vicknair Michael Macias. A Comparison of a Graph
Database and a Relational Database. https : / / www . researchgate . net /
publication / 220996559 _ A _ comparison _ of _ a _ graph _ database _ and _ a _
relational_database_A_data_provenance_perspective. 2010.
[Gre18] Gede Putra Kusuma Gregorius Ongo. Hybrid Database System of MySQL and
MongoDB in Web Application Development. https://ieeexplore.ieee.org/
document/7905665. 3-5 Sept. 2018.
[Jup] Jupyter. Project Jupyter. https://jupyter.org.
[Lak] Gayan Liyanaarachchi Lakshan Kasun Malki Nimesha. MigDB – Relational
to NoSQL mapper. https : / / https : / / ieeexplore . ieee . org / document /
7946576.
[Mic] Microsoft. Microsoft Open Database Connectivity (ODBC). https : / / docs .
microsoft.com/en-us/sql/odbc/microsoft-open-database-connectivity-
odbc?view=sql-server-ver15.
[ope] openclassroom. Maîtrisez les bases de données NoSQL. https://openclassrooms.
com / fr / courses / 4462426 - maitrisez - les - bases - de - donnees - nosql /
4474601-decouvrez-le-fonctionnement-de-mongodb.
[Qut19] Mahmoud H. Qutqut. Types of NoSQL Database. https://www.researchgate.
net/figure/Types-of-NoSQL-Database_fig1_336746925. 2019.
[Rij13] Mark van Rijmenam. Why The 3V’s Are Not Sufficient To Describe Big Data.
https://datafloq.com/read/3vs- sufficient- describe- big- data/166,
consulted in 20/04/2020. 2013.
[Sas18] Bryce Merkl Sasaki. Graph Databases for Beginners: ACID vs. BASE Explained.
https://neo4j.com/blog/acid-vs-base-consistency-models-explained/.
2018.
[SQL] SQLAlchemy. SQLAlchemy. https://www.sqlalchemy.org.

50
NETOGRAPHY

[Ver18] Amit Verma. 5 Reasons Why You Should Choose Python for Big Data. https://
www.whizlabs.com/blog/python-and-big-data/?fbclid=IwAR16pi7uuSzcnJ3JC-
Ly5cjSTE1qpQTlBaROGijpTkX45BpYBGeD1XzSQls. 2018.
[Wik17] Wikimedia. 3-tier client/server model architecture. https://commons.wikimedia.
org/wiki/File:Client-Server_3-tier_architecture_-_en.png. 2017.

NETOGRAPHY 51

You might also like