You are on page 1of 5

The relational model is dead, SQL is dead,

and I don’t feel so good myself

Paolo Atzeni Christian S. Jensen Giorgio Orsi Sudha Ram


Letizia Tanca Riccardo Torlone

ABSTRACT ing data using the relational model. The debate on


We report the opinions expressed by well-known SQL vs. NoSQL is as much a debate on SQL, the
database researchers on the future of the relational language, as on the relational model and its various
model and SQL during a panel at the International implementations.
Workshop on Non-Conventional Data Access (NoCoDa Relational database management systems have
2012), held in Florence, Italy in October 2012 in con- been around for more than thirty years. During
junction with the 31st International Conference on Con- this time, several revolutions (such as the Object
ceptual Modeling. The panelists include: Paolo Atzeni Oriented database movement) have erupted, many
(Università Roma Tre, Italy), Umeshwar Dayal (HP of which threatened to doom SQL and relational
Labs, USA), Christian S. Jensen (Aarhus University, databases. These revolutions eventually fizzled out,
Denmark), and Sudha Ram (University of Arizona, and none made even a small dent in the domi-
USA). Quotations from movies are used as a playful nance of relational databases. The latest revolu-
though effective way to convey the dramatic changes tion appears to be from NoSQL databases that are
that database technology and research are currently un- touted to be non-relational, horizontally scalable,
dergoing. distributed and, for the most part, open source.
The big interest of academia and industry in the
NoSQL movement gives birth, once more, to a num-
1. INTRODUCTION ber of challenging questions on the future of SQL
As more and more information becomes available and of the relational approach to the management
to a growing multitude of people, the ways to man- of data. We discussed some of them during a lively
age and access data are rapidly evolving as they panel at the NoCoDa Workshop, an event held in
must take into consideration, on one front, the kind Florence, Italy in October 2012 organized by Gior-
and volume of data available today and, on the gio Orsi (Oxford University), Letizia Tanca (Po-
other front, a new and larger population of prospec- litecnico di Milano) and Riccardo Torlone (Univer-
tive users. This need on two opposite fronts has sità Roma Tre). We have used a provocative title
originated a steadily growing set of proposals for (paraphrasing a quote often attributed to Woody
non-conventional ways to manage and access data, Allen) and quotations from movies to elaborate on
which fundamentally rethink the concepts, tech- three main issues:
niques, and tools conceived and developed in the
• the possible decline of the relational model and
database field during the last forty years. Recently,
of SQL as a consequence of the rise of the non-
these proposals have produced a new generation of
relational technology,
data management systems, mostly non-relational,
proposed as effective solutions to the needs of an • the need for logical data models and theoreti-
increasing number of large-scale applications for cal studies in the NoSQL world, and
which traditional database technology is unsatisfac-
• the possible consequences of sacrificing the
tory.
ACID properties in favor of system perfor-
Today, it is common to include all the non-
mance and data availability.
relational technologies for data management under
the umbrella term of “NoSQL” databases. Still, it In the following sections we discuss these issues in
is appropriate to point out that SQL and relational turn and close the paper with a final discussion.
DBMSs are not synonymous. The former is a lan- Since a consensus was reached on most of the is-
guage, while the latter is a mechanism for manag- sues addressed in the panel, we synthesize shared

64 SIGMOD Record, June 2013 (Vol. 42, No. 2)


opinions, rather than report contributions to the rates and availability requirements increase and as
discussion by single individuals. data stores move into the cloud. The new breed
of NoSQL systems are designed so they can eas-
2. THE END OF AN ERA? ily scale up using low cost commodity processors to
yield economic advantages.
2.1 Relational databases Next, the data management applications have not
just grown to concern more diverse kinds and uses
“The ship will sink.” “You’re certain?” of data. They have also become more complex. A
“Yes. In an hour or so, all of this will be single application may involve diverse kinds of data.
at the bottom of the Atlantic.” This means that it is generally not possible for an
(Titanic. 1997) application to use the single model and query lan-
guage that is best for a single kind of data.
According to Stonebraker et al., RDBMS are 25- There are indeed two different issues here, related
year-old legacy code lines that should be retired in to the model level and to the implementation. In
favor of a collection of from-scratch specialized en- terms of implementation, it is clear (and it has been
gines [9]. Are we really attending the sinking of the clear for more than a decade) that different appli-
relational ship? cations have different requirements, especially when
One needs to distinguish between the relational performance is a concern. This has led for exam-
model and its dominant query language, SQL, on ple to separating OLTP and OLAP applications,
the one hand and relational database management even when the latter makes use of data produced
systems on the other. by the former. Further, different engines with dif-
The relational model and SQL were invented at ferent capabilities have been developed for the two
a time when data management targeted primarily worlds, with specific support, the ones with more
administrative applications. The goal was to sup- support for throughput of transactions and the oth-
port applications exemplified well by banking. The ers with support for very complex queries. With re-
data is well structured: accounts, customers, loans, spect to models, the point is that most applications
etc. And typical transactions include withdrawals do need mainly simple operations over models that
and deposits that alter account balances. The rela- are somehow more complex than the relational one.
tional model and SQL are well suited for managing NoSQL systems try to respond to these needs: im-
this kind of data and supporting workloads made plementations are new and specialized, operations
up from these kinds of transactions. are very simple, and diverse models (see the dis-
However, the data management landscape has cussion on heterogeneity below) share the idea of
evolved, and today’s landscape of data management being flexible (semistructured and with little or no
applications is much more diverse than it was when schema).
the relational model and SQL were born. Exam-
ples of this diversity abound: semi-structured data, 2.2 SQL
unstructured data, continuous data, sensor data,
streaming data, uncertain data, graph data, and “Whoa, lady, I only speak two languages,
complexly structured data. Similar diversities can English and bad English.”
be found in the workloads to be supported today. (The Fifth Element. 1997)
Thus, while relational database systems were first
proposed as a way to store and manage struc- A variety of data models and access methods
tured data, burgeoning NoSQL databases, such as are emerging and SQL is not suitable for any of
CouchDB, MongoDB, Cassandra, and Hbase, have them. Are we building the Babel Tower of query
emerged as a way to store unstructured data and languages?
other complex objects such as documents, data SQL has several advantages — it is a simple yet
streams, and graphs. With the rise of the real-time powerful declarative language for set-oriented oper-
web, NoSQL databases were designed to deal with ations. SQL captures the essential patterns of data
very large volumes of data. manipulation, including intersections/joins, filters,
Moreover, while relational database systems are and aggregations or reductions. Programmers who
usually scaled up (i.e., moved to larger and more profess a dislike for SQL appear to have been de-
powerful servers), NoSQL database systems are de- ceived by its simplicity. The existence of languages
signed to scale out, i.e, the database is distributed such as SQLDF [4], which allows SQL queries on
across multiple hosts as load increases. This is more R data frames, add SQL functionality for analyt-
in line with real time web traffic as transaction ics on Big Data. SQL’s declarative expressions are

SIGMOD Record, June 2013 (Vol. 42, No. 2) 65


frequently more readable and compact than their R 3. MODEL, THEORY AND DESIGN
programmatic equivalents. Powerful extensions to
SQL, based on window functions, provide a ”split- 3.1 Logical data models
apply” functionality otherwise known as map func-
tion. Combining these with SQL’s GROUP BY op- “Underneath, it’s a hyper-alloy combat
eration, which is in reality a reduce function, essen- chassis, microprocessor-controlled. But
tially provides the equivalent of operations such as outside, it’s living human tissue: flesh,
those in the Map Reduce framework. skin, hair, blood.” (Terminator. 1984)
However, in spite of the research and develop-
ment, the relational model and SQL may not be Aren’t NoSQL database models too close to the
the best foundation for managing every new kind physical data structures? What about physical data
of data and workload. The SQL-86 standard was independence?
a small and simple document. Then came SQL- The ANSI SPARC architecture for database sys-
89, SQL-92, SQL:1999, SQL:2003, SQL:2006, and tems was defined in 1975 with the fundamental
SQL:2008. The current standard, SQL:2011, is very goal of setting a standard for data independence
complex, and most data management professionals for DBMS vendor implementations. It appears that
will find it challenging to understand. How many current NoSQL systems make no distinction be-
people have read and understood the entire SQL tween the logical and physical schema. Thus, the
standard? Few claim that SQL is an elegant lan- fundamental advantages of the ANSI SPARC ar-
guage characterized by orthogonality. Some call it chitecture have been voided, which complicates the
an elephant on clay feet. With each addition, its maintenance of these databases. Storing objects as
body grows, and it becomes less stable. SQL stan- they are programmed essentially negates the data
dardization is largely the domain of database ven- independence requirement that then remains to be
dors, not academic researchers without commercial adequately addressed for NoSQL database systems.
interests or users with user interests. Who is that Strong typing of relations also allows definition of
good for? a variety of integrity constraints at the schema
Another aspect is that the SQL syntax requires level, a very important consideration for transac-
the use of joins, considered ill-fit for, e.g., prefer- tion processing systems that support a variety of
ences and data structures for complex objects or read, write, delete, and update transactions.
completely unstructured data: many programmers Relational database systems are criticized for the
would prefer to not do joins at all, keeping the data strong typing of relational schemas, which makes
in a physical structure that fits the programming it difficult to alter the data model. Even mi-
task as opposed to extracting it from a logical struc- nor changes to the data model of a relational
ture that is relational. Complex objects that con- database have to be carefully managed and may
tain items and lists do not always map directly to a require downtime or reduced service levels. NoSQL
single row in a single table, and writing SQL queries databases have far more relaxed — or even nonexis-
to grab the data spread out across many tables, tent — data model restrictions. NoSQL Key Value
when all you want is a record, is inconsistent with stores and document databases allow applications
the belief that data should be persisted the way it to store virtually any structure it wants in a data
is programmed. element. Even the more rigidly defined BigTable-
On the other hand, the tumultuous developments based NoSQL databases (Cassandra, HBase) typ-
we are observing have generated dozens of systems ically allow new columns to be created with little
each with its own modeling features and its own effort. Actually, organizations should carefully eval-
APIs [2, 8], and this is definitely generating con- uate the advantages and limitations of each type of
fusion. Indeed, the lack of a standard is a great systems (i.e. relational and NoSQL) for Big Data
concern for companies interested in adopting any and then make an informed decision.
of these systems [7]: applications and data are ex- A common, high level interface could really be of
pensive to convert and competencies and expertise use here. However it has to be simple, especially
acquired on a specific system get wasted in case in terms of operations, as is the case for NoSQL
of migration. Efforts that support interoperability systems. It is also worth mentioning that developers
and translation are definitely needed [1]. Original of the various systems follow “best practices” that
approaches in this direction are needed, given the support efficient execution of operations. An effort
simplicity of operations and the almost total ab- should be made to design a common interface by
sence of schemas. using the best practices of each system, with the
goal of re-achieving physical independence.

66 SIGMOD Record, June 2013 (Vol. 42, No. 2)


3.2 Database theory How is database design affected by the recent
paradigm shifts on logical data modeling? Is concep-
“I’ve seen things you people wouldn’t be- tual database design really too old for this country?
lieve. [. . . ] All those moments, will be The methodological framework consisting of con-
lost in time, like tears in rain. Time to ceptual data modeling followed by the translation
die.” (Blade Runner. 1982) of the ER (or class-diagram) schema into a logical
(relational) one can still be adopted: after all, these
Do we still need theoretical research in the new systems have to be accessed by applications. So,
world? Has relational database theory become irrel- even if there is no schema in the data store, it is
evant? very likely that the data objects belong to classes,
The introduction of the relational model in 1970 whose definitions appear in the programs, so some
marked a striking difference with respect to all the contribution could arise. At the same time, flexibil-
previous research on databases. The main reason ity is a must, as objects could come from classes in
for this lies in the strong mathematical foundations an inheritance hierarchy, so polymorphism should
upon which this model is based, which provided be supported. The availability of a high-level rep-
the database research community with the possibil- resentation of the data at hand, be it logical or con-
ity to approach the problems that were raised dur- ceptual, remains a fundamental tool for developers
ing the years by means of logical and mathematical and users, since it makes understanding, managing,
tools, and to ensure the correctness and effective- accessing, and integrating information sources much
ness of the proposed solutions by solid mathemati- easier, independently of the technologies used.
cal proofs.
This approach has caused the blooming of gen- 4. ACID OR AVAILABLE?
erations of splendid theoreticians who have set the
foundations of the relational model, but have also “Ask me a question I would normally lie
contributed to adapting their experience to de- to.” (True Lies. 1994)
vise new methods and techniques for solving the
problems derived from the advent of new chal- A relational database is a perfect world where
lenges. Consider for instance the introduction data is always consistent (even if not true). Are
of new paradigms for representing and querying the ACID properties really less relevant in modern
semi-structured and unstructured data: since the database applications? Are we ready for a chaotic
nineties, invaluable theoretical research has laid the world where data is always available but only “even-
foundations for dealing with XML and the related tually” consistent?
query languages, with HTML Web data, with the While preserving ACID properties may not be
Semantic Web, and with unstructured data like im- as important for databases that typically contain
ages and videos. It would be interesting to see what append only data, they are absolutely essential for
the work on semi-structured data and XML (mod- most operational systems and online transaction
elling and languages) can contribute in the setting processing systems, including retail, banking, and
of NoSQL databases, since after all many of the finance. ACID compliance may not be important
problems rising from this new data model(s) have to a search engine that may return different results
been discussed already within the semi-structured to two users simultaneously, or to Amazon when
data research. returning sets of different reviews to two users. In
The lessons learned from developing the re- these applications, speed and performance triumph
lational database theory have probably laid the the consistency of the results. However, in a bank-
methodological foundations for approaching most ing application, two users of the same account need
data-related problems, since, however unstructured to see the same balance in their account. A utility
and unkempt the datasets at hand, the understand- company needs to display the same “payment due
ing developed within the community will ever in- amount” to two or more users perusing an account.
form its research strategies. The idea of “eventual consistency” for such applica-
tions could lead to chaos in the business world. Is
3.3 Database design it by chance that just those applications that need
full consistency are often those that better match
“They rent out rooms for old people, the relational structure? Can we imagine a bank,
kill’em, bury’em in the yard, cash their a manufacturing or a commercial company which
social security checks.” would rather use a complex-object data model to
(No Country for Old Men. 2007) represent their data? This is probably why many

SIGMOD Record, June 2013 (Vol. 42, No. 2) 67


people mix up the structure of the relational model sor data in clusters of commodity hardware – that
with the ACID properties, which in principle are may make it worthwhile.
completely independent aspects. Therefore, we all believe that relational and
A consequence of the choices made in some sys- NoSQL database systems will continue to coexist.
tems about weak forms of consistency is that the In the era of large, decentralized, distributed en-
burden is passed to applications developers, when vironments where the amount of devices and data
they need to ensure more sophisticated transaction and their heterogeneity is getting out of control,
properties. billions of sensors and devices collect, communicate
An observation that has been recently made and create data, while the Web and the social net-
about transaction management (and other imple- works are widening the number of data formats and
mentation issues) is related to the fact that it can providers. NoSQL databases are most often appro-
be easy to omit features, as this simplifies the de- priate for such applications, which either do not re-
velopment, but it might be difficult to reintroduce quire ACID properties or need to deal with objects
them later. Mohan [6] points out that there were which are clumsily represented in relational terms.
experiences in the past with similar simplifications, As a conclusion, NoSQL data storage appears to
and it was later very complex to obtain more gen- be additional equipment that business enterprises
eral and powerful systems— some features needed may choose to complete their assortment of storage
to be rewritten from scratch. services.
With all these questions ahead the contribution
5. FINAL COMMENTS the database community can give is huge. Let us
take a full breath and start anew!
“Look! It’s moving. It’s alive!!”
(Frankenstein. 1931) 6. REFERENCES
[1] P. Atzeni, F. Bugiotti, and L. Rossi. Uniform
In spite of the shortcomings and inadequacies of
access to non-relational database systems:
the relational model and SQL, these technologies
The SOS platform. In CAiSE 2012, Springer,
are, however, still going strong. Why? A key rea-
pages 160–174, 2012.
son is that the systems that implement these are
[2] R. Cattell. Scalable SQL and NoSQL data
plentiful and have proven their worth. Perhaps
stores. SIGMOD Record, 39(4):12–27, 2010.
the most important reason is that enormous in-
[3] M. Driscoll. SQL is Dead. Long Live SQL!
vestments are sitting in applications built on top
http://www.dataspora.com/2009/11/
of such systems. Companies around the globe rely
sql-is-dead-long-live-sql/, 2009.
on these applications and their underlying database
[4] G. Grothendieck. SQLDF: SQL select on R
management systems for their day-to-day business.
data frames.
Actually, relational DBMS provide the most under-
http://code.google.com/p/sqldf/, 2012.
standable format for business application data, and
[5] G. Harrison. 10 things you should know about
at the same time guarantee the consistency prop-
NoSQL databases. http:
erties that are needed in business. In addition, the
//www.techrepublic.com/blog/10things/
skill sets of their current and prospective employ-
10-things-you-should-know-about-nosql-
ees are targeted at these systems. It is not an easy
databases/1772, 2010.
decision to throw away relational and SQL technol-
[6] C. Mohan. History repeats itself: sensible and
ogy and instead adopt new technology. Rather, it
NonsenSQL aspects of the NoSQL hoopla. In
is much easier to extend the current applications
EDBT 2013, ACM, pag. 11–16, 2013.
and systems with no radical changes. Indeed, to
[7] M. Stonebraker. Stonebraker on NoSQL and
the extent applications involve standard adminis-
enterprises. Commun. ACM, 54:10–11, 2011.
trative data and “new” data, relational technology
[8] M. Stonebraker and R. Cattell. 10 rules for
may even be best suited.
scalable performance in ’simple operation’
Thus, when is it reasonable for an organization
datastores. Commun. ACM, 54(6):72–80,
to bet on a tool that is slightly incompatible with
2011.
all the others, may be built by a community in
[9] M. Stonebraker, S. Madden, D. J. Abadi, S.
open source model, does not grant consistency and
Harizopoulos, N. Hachem, and P. Helland.
concurrency control and is subject to change, ne-
The end of an architectural era: (it’s time for
glect, and abandon at any point in time? The point
a complete rewrite). In VLDB 2007, VLDB
is that there are killer applications – e.g. storing
Endowment, pag. 1150-1160, 2007.
huge amounts of (read-only) social-network or sen-

68 SIGMOD Record, June 2013 (Vol. 42, No. 2)

You might also like