You are on page 1of 4

2015 8th International Conference on Database Theory and Application

A Study on Data Input and Output Performance Comparison of


MongoDB and PostgreSQL in the Big Data Environment

Min-Gyue Jung, Seon-A Youn, Jayon Bae, Yong-Lak Choi


Graduate School of Information Science, Soongsil University, South Korea
Graduate School of Software, Soongsil University, South Korea
Graduate School of Software, Soongsil University, South Korea
Graduate School of Software, Soongsil University, South Korea
369, Sangdo-ro, Dongjak-gu, Seoul, South Korea
eprkswu@gmail.com, raonseona@gmail.com, spy_neo@naver.com,
ylchoi58@ssu.ac.kr

Abstract
Due to advancement of social network and popularization of mobile devices, the existing
relational database management system(RDBMS)'s processing of massive data has become
an issue. NoSQL is a database management system which makes processing of massive
and/or unstructured data easier, and many companies today tend to start a project using
NoSQL. Moreover, converting the RDBMS of current systems to NoSQL has become a trend.
This study assesses the performance differences between RDBMS and NoSQL. The optimal
design for enhanced functionality when using NoSQL has been described as well. In this
study, PostgreSQL and MongoDB have been selected to represent RDBMS and NoSQL
respectively, for comparative analysis.

1. Introduction
Big data, in early days, received attention for its volume of data. However, due to
development of various electronic devices, not only text data, but also various data - SNS data
and stream sensor data that does not fit the existing report formats - are being generated in
real-time. Relational database management system (RDBMS) that has been being utilized has
issues in structuring unstructured data, and has performance/cost problems in processing
massive data. Having memory mapping function, NoSQL performs read/write fast, which
makes NoSQL suitable for processing big data. In addition, unlike RDBMS that processed
mainly of structured data, NoSQL can handle unstructured data with more ease. Thus, many
companies building big data system that includes unstructured and sensor data, tend to take
advantage of NoSQL’s features. This study assesses the performance differences between
RDBMS and NoSQL, and provides optimal design for enhanced functionality when using
NoSQL.

2. Related Research
2.1 PostgreSQL and MongoDB
2.1.1 PostgreSQL
PostgreSQL, unlike ORACLE that requires a license fee, is one of the open source
RDBMS. Open source RDBMS is free of charge, which includes PostgreSQL, MariaDB and
CUBRID, etc. Through this study, PostgreSQL has been used, for its ease of installation.
PostgreSQL implements the server/client model and supports the standard language of
database, ANSI:SQL2011, and most of its functions including transactions. Due to separation
of client and server in PostgreSQL, client’s library became lighter, and any changes in the
database engine won’t affect the client[1].

978-1-4673-9849-7/15 $31.00 © 2015 IEEE 14


DOI 10.1109/DTA.2015.14
2.1.2 MongoDB
RDBMS, even with a lot of advantages, there are a few disadvantages that make
RDBMS not suitable for big data environments. The main disadvantage is the object-
relation inconsistency – there’s difference between the relational model and data
structure in memory. In order to save the data structures within the complex memory to
RDBMS, data has to be converted into the relational structure. Variety of data formats
and increased data volume led to increase in traffic. To process the increasing data,
cluster has been composed with inexpensive hardware[2].
NoSQL does not implement relational model, and operates well in a cluster.
Therefore, NoSQL handles various formats of data and massive data with more ease.

3. Performance comparison of PostgreSQL and MongoDB


3.1 Data model of performance comparison
Relational data model has been created to use in PostgreSQL and MongoDB.

Fig. 1. Logical and physical ERD of relational data model

[Fig. 1] illustrates the architecture of relational data model – a table design for a user,
user’s card, and the mileage. Relationship between user and card has been designed to be 1:N,
since one user can own multiple cards.
For MongoDB, unstructured database has been designed separately. [Fig. 2] shows
logical/physical model of unstructured data. It is designed to store user, card information and
mileage points in one table.

Fig. 1. Logical/physical ERD of unstructured data model

15
3.2 Performance comparison
To compare the performance of PostgreSQL and MongoDB, a program capable of insert,
select, update, and delete operations has been developed. Insert, select, update, and delete
operations have been executed for each of 30000, 90000, 150000, 210000 and 300000 data
cases based on card mileage information.

The insert operation has been executed in PostgreSQL, and the same operation in
MongoDB with a design similar to relational model. Afterwards, insert operation has been
executed in MongoDB designed with unstructured data model. Lastly, elapsed time for each
insert operation has been compared. [Fig. 3] shows chart of insert operation speed.
Speed of select operation has been measured by time taken from selection of data to
processing of inserted data to be available. [Fig. 4] shows chart of select operation speed.

Fig. 2. Chart of insert operation speed Fig. 4. Chart of select operation speed

Fig. 5. Chart of update operation speed Fig. 6. The chart of delete operation speed

Elapsed time of update operation on mileage card table, the highest number of data is stored,
has been measured. [Fig. 5] shows chart of update operation speed.
Elapsed time for deleting all data of each DBMS has been measured and compared. [Fig. 6]
shows chart of delete operation speed.

16
4. Conclusion
Through this study, performance comparison of insert, select, update, and delete operations
on data has been conducted to compare PostgreSQL(RDBMS) and MongoDB(NoSQL).
INSERT, SELECT, UPDATE and DELETE operation speed of MongoDB was faster than
that of PostgreSQL in general. For performance improvement of MongoDB, designing with
unstructured data model seems to be better than designing with relational data model.
Moreover, we could notice that the select operation of PostgreSQL could be improved by
using index.
In conclusion, using MongoDB with unstructured data model will provide overall
performance improvement. However, when the environment requires precise and structured
data model, using RDBMS such as PostgreSQL will exhibit higher quality of performance.
By applying the comparison results of this study to the DBMS design when implementing
NoSQL, we can expect for better performance maintenance. We also expect that further
studies on performance of distributed processing by RDBMS and NoSQL will contribute to
advancement of database technology in big data environment.

10. References
[1] Ishii Tatsuo, “PostgreSQL”, youngjin.com, 2001
[2] Pramod J. Sadalage, Martin Fowler, “NoSQL distilled : a brief guide to the emerging world of polyglot
persistence.”, insight, 2013)
[3] Kyeong-Suk Kim, Hong-Won Yoon, “Database of Theory”, ehan media, 2004
[4] Kristina Chodorow, “MongoDB: the definitive guide”, hanbit media, 2011

17

You might also like