Paolo Atzeni Christian S. Jensen Giorgio Orsi Sudha Ram
Letizia Tanca Riccardo Torlone
ABSTRACT ing data using the relational model. The debate on
We report the opinions expressed by well-known SQL vs. NoSQL is as much a debate on SQL, the database researchers on the future of the relational language, as on the relational model and its various model and SQL during a panel at the International implementations. Workshop on Non-Conventional Data Access (NoCoDa Relational database management systems have 2012), held in Florence, Italy in October 2012 in con- been around for more than thirty years. During junction with the 31st International Conference on Con- this time, several revolutions (such as the Object ceptual Modeling. The panelists include: Paolo Atzeni Oriented database movement) have erupted, many (Università Roma Tre, Italy), Umeshwar Dayal (HP of which threatened to doom SQL and relational Labs, USA), Christian S. Jensen (Aarhus University, databases. These revolutions eventually fizzled out, Denmark), and Sudha Ram (University of Arizona, and none made even a small dent in the domi- USA). Quotations from movies are used as a playful nance of relational databases. The latest revolu- though effective way to convey the dramatic changes tion appears to be from NoSQL databases that are that database technology and research are currently un- touted to be non-relational, horizontally scalable, dergoing. distributed and, for the most part, open source. The big interest of academia and industry in the NoSQL movement gives birth, once more, to a num- 1. INTRODUCTION ber of challenging questions on the future of SQL As more and more information becomes available and of the relational approach to the management to a growing multitude of people, the ways to man- of data. We discussed some of them during a lively age and access data are rapidly evolving as they panel at the NoCoDa Workshop, an event held in must take into consideration, on one front, the kind Florence, Italy in October 2012 organized by Gior- and volume of data available today and, on the gio Orsi (Oxford University), Letizia Tanca (Po- other front, a new and larger population of prospec- litecnico di Milano) and Riccardo Torlone (Univer- tive users. This need on two opposite fronts has sità Roma Tre). We have used a provocative title originated a steadily growing set of proposals for (paraphrasing a quote often attributed to Woody non-conventional ways to manage and access data, Allen) and quotations from movies to elaborate on which fundamentally rethink the concepts, tech- three main issues: niques, and tools conceived and developed in the • the possible decline of the relational model and database field during the last forty years. Recently, of SQL as a consequence of the rise of the non- these proposals have produced a new generation of relational technology, data management systems, mostly non-relational, proposed as effective solutions to the needs of an • the need for logical data models and theoreti- increasing number of large-scale applications for cal studies in the NoSQL world, and which traditional database technology is unsatisfac- • the possible consequences of sacrificing the tory. ACID properties in favor of system perfor- Today, it is common to include all the non- mance and data availability. relational technologies for data management under the umbrella term of “NoSQL” databases. Still, it In the following sections we discuss these issues in is appropriate to point out that SQL and relational turn and close the paper with a final discussion. DBMSs are not synonymous. The former is a lan- Since a consensus was reached on most of the is- guage, while the latter is a mechanism for manag- sues addressed in the panel, we synthesize shared
64 SIGMOD Record, June 2013 (Vol. 42, No. 2)
opinions, rather than report contributions to the rates and availability requirements increase and as discussion by single individuals. data stores move into the cloud. The new breed of NoSQL systems are designed so they can eas- 2. THE END OF AN ERA? ily scale up using low cost commodity processors to yield economic advantages. 2.1 Relational databases Next, the data management applications have not just grown to concern more diverse kinds and uses “The ship will sink.” “You’re certain?” of data. They have also become more complex. A “Yes. In an hour or so, all of this will be single application may involve diverse kinds of data. at the bottom of the Atlantic.” This means that it is generally not possible for an (Titanic. 1997) application to use the single model and query lan- guage that is best for a single kind of data. According to Stonebraker et al., RDBMS are 25- There are indeed two different issues here, related year-old legacy code lines that should be retired in to the model level and to the implementation. In favor of a collection of from-scratch specialized en- terms of implementation, it is clear (and it has been gines [9]. Are we really attending the sinking of the clear for more than a decade) that different appli- relational ship? cations have different requirements, especially when One needs to distinguish between the relational performance is a concern. This has led for exam- model and its dominant query language, SQL, on ple to separating OLTP and OLAP applications, the one hand and relational database management even when the latter makes use of data produced systems on the other. by the former. Further, different engines with dif- The relational model and SQL were invented at ferent capabilities have been developed for the two a time when data management targeted primarily worlds, with specific support, the ones with more administrative applications. The goal was to sup- support for throughput of transactions and the oth- port applications exemplified well by banking. The ers with support for very complex queries. With re- data is well structured: accounts, customers, loans, spect to models, the point is that most applications etc. And typical transactions include withdrawals do need mainly simple operations over models that and deposits that alter account balances. The rela- are somehow more complex than the relational one. tional model and SQL are well suited for managing NoSQL systems try to respond to these needs: im- this kind of data and supporting workloads made plementations are new and specialized, operations up from these kinds of transactions. are very simple, and diverse models (see the dis- However, the data management landscape has cussion on heterogeneity below) share the idea of evolved, and today’s landscape of data management being flexible (semistructured and with little or no applications is much more diverse than it was when schema). the relational model and SQL were born. Exam- ples of this diversity abound: semi-structured data, 2.2 SQL unstructured data, continuous data, sensor data, streaming data, uncertain data, graph data, and “Whoa, lady, I only speak two languages, complexly structured data. Similar diversities can English and bad English.” be found in the workloads to be supported today. (The Fifth Element. 1997) Thus, while relational database systems were first proposed as a way to store and manage struc- A variety of data models and access methods tured data, burgeoning NoSQL databases, such as are emerging and SQL is not suitable for any of CouchDB, MongoDB, Cassandra, and Hbase, have them. Are we building the Babel Tower of query emerged as a way to store unstructured data and languages? other complex objects such as documents, data SQL has several advantages — it is a simple yet streams, and graphs. With the rise of the real-time powerful declarative language for set-oriented oper- web, NoSQL databases were designed to deal with ations. SQL captures the essential patterns of data very large volumes of data. manipulation, including intersections/joins, filters, Moreover, while relational database systems are and aggregations or reductions. Programmers who usually scaled up (i.e., moved to larger and more profess a dislike for SQL appear to have been de- powerful servers), NoSQL database systems are de- ceived by its simplicity. The existence of languages signed to scale out, i.e, the database is distributed such as SQLDF [4], which allows SQL queries on across multiple hosts as load increases. This is more R data frames, add SQL functionality for analyt- in line with real time web traffic as transaction ics on Big Data. SQL’s declarative expressions are
SIGMOD Record, June 2013 (Vol. 42, No. 2) 65
frequently more readable and compact than their R 3. MODEL, THEORY AND DESIGN programmatic equivalents. Powerful extensions to SQL, based on window functions, provide a ”split- 3.1 Logical data models apply” functionality otherwise known as map func- tion. Combining these with SQL’s GROUP BY op- “Underneath, it’s a hyper-alloy combat eration, which is in reality a reduce function, essen- chassis, microprocessor-controlled. But tially provides the equivalent of operations such as outside, it’s living human tissue: flesh, those in the Map Reduce framework. skin, hair, blood.” (Terminator. 1984) However, in spite of the research and develop- ment, the relational model and SQL may not be Aren’t NoSQL database models too close to the the best foundation for managing every new kind physical data structures? What about physical data of data and workload. The SQL-86 standard was independence? a small and simple document. Then came SQL- The ANSI SPARC architecture for database sys- 89, SQL-92, SQL:1999, SQL:2003, SQL:2006, and tems was defined in 1975 with the fundamental SQL:2008. The current standard, SQL:2011, is very goal of setting a standard for data independence complex, and most data management professionals for DBMS vendor implementations. It appears that will find it challenging to understand. How many current NoSQL systems make no distinction be- people have read and understood the entire SQL tween the logical and physical schema. Thus, the standard? Few claim that SQL is an elegant lan- fundamental advantages of the ANSI SPARC ar- guage characterized by orthogonality. Some call it chitecture have been voided, which complicates the an elephant on clay feet. With each addition, its maintenance of these databases. Storing objects as body grows, and it becomes less stable. SQL stan- they are programmed essentially negates the data dardization is largely the domain of database ven- independence requirement that then remains to be dors, not academic researchers without commercial adequately addressed for NoSQL database systems. interests or users with user interests. Who is that Strong typing of relations also allows definition of good for? a variety of integrity constraints at the schema Another aspect is that the SQL syntax requires level, a very important consideration for transac- the use of joins, considered ill-fit for, e.g., prefer- tion processing systems that support a variety of ences and data structures for complex objects or read, write, delete, and update transactions. completely unstructured data: many programmers Relational database systems are criticized for the would prefer to not do joins at all, keeping the data strong typing of relational schemas, which makes in a physical structure that fits the programming it difficult to alter the data model. Even mi- task as opposed to extracting it from a logical struc- nor changes to the data model of a relational ture that is relational. Complex objects that con- database have to be carefully managed and may tain items and lists do not always map directly to a require downtime or reduced service levels. NoSQL single row in a single table, and writing SQL queries databases have far more relaxed — or even nonexis- to grab the data spread out across many tables, tent — data model restrictions. NoSQL Key Value when all you want is a record, is inconsistent with stores and document databases allow applications the belief that data should be persisted the way it to store virtually any structure it wants in a data is programmed. element. Even the more rigidly defined BigTable- On the other hand, the tumultuous developments based NoSQL databases (Cassandra, HBase) typ- we are observing have generated dozens of systems ically allow new columns to be created with little each with its own modeling features and its own effort. Actually, organizations should carefully eval- APIs [2, 8], and this is definitely generating con- uate the advantages and limitations of each type of fusion. Indeed, the lack of a standard is a great systems (i.e. relational and NoSQL) for Big Data concern for companies interested in adopting any and then make an informed decision. of these systems [7]: applications and data are ex- A common, high level interface could really be of pensive to convert and competencies and expertise use here. However it has to be simple, especially acquired on a specific system get wasted in case in terms of operations, as is the case for NoSQL of migration. Efforts that support interoperability systems. It is also worth mentioning that developers and translation are definitely needed [1]. Original of the various systems follow “best practices” that approaches in this direction are needed, given the support efficient execution of operations. An effort simplicity of operations and the almost total ab- should be made to design a common interface by sence of schemas. using the best practices of each system, with the goal of re-achieving physical independence.
66 SIGMOD Record, June 2013 (Vol. 42, No. 2)
3.2 Database theory How is database design affected by the recent paradigm shifts on logical data modeling? Is concep- “I’ve seen things you people wouldn’t be- tual database design really too old for this country? lieve. [. . . ] All those moments, will be The methodological framework consisting of con- lost in time, like tears in rain. Time to ceptual data modeling followed by the translation die.” (Blade Runner. 1982) of the ER (or class-diagram) schema into a logical (relational) one can still be adopted: after all, these Do we still need theoretical research in the new systems have to be accessed by applications. So, world? Has relational database theory become irrel- even if there is no schema in the data store, it is evant? very likely that the data objects belong to classes, The introduction of the relational model in 1970 whose definitions appear in the programs, so some marked a striking difference with respect to all the contribution could arise. At the same time, flexibil- previous research on databases. The main reason ity is a must, as objects could come from classes in for this lies in the strong mathematical foundations an inheritance hierarchy, so polymorphism should upon which this model is based, which provided be supported. The availability of a high-level rep- the database research community with the possibil- resentation of the data at hand, be it logical or con- ity to approach the problems that were raised dur- ceptual, remains a fundamental tool for developers ing the years by means of logical and mathematical and users, since it makes understanding, managing, tools, and to ensure the correctness and effective- accessing, and integrating information sources much ness of the proposed solutions by solid mathemati- easier, independently of the technologies used. cal proofs. This approach has caused the blooming of gen- 4. ACID OR AVAILABLE? erations of splendid theoreticians who have set the foundations of the relational model, but have also “Ask me a question I would normally lie contributed to adapting their experience to de- to.” (True Lies. 1994) vise new methods and techniques for solving the problems derived from the advent of new chal- A relational database is a perfect world where lenges. Consider for instance the introduction data is always consistent (even if not true). Are of new paradigms for representing and querying the ACID properties really less relevant in modern semi-structured and unstructured data: since the database applications? Are we ready for a chaotic nineties, invaluable theoretical research has laid the world where data is always available but only “even- foundations for dealing with XML and the related tually” consistent? query languages, with HTML Web data, with the While preserving ACID properties may not be Semantic Web, and with unstructured data like im- as important for databases that typically contain ages and videos. It would be interesting to see what append only data, they are absolutely essential for the work on semi-structured data and XML (mod- most operational systems and online transaction elling and languages) can contribute in the setting processing systems, including retail, banking, and of NoSQL databases, since after all many of the finance. ACID compliance may not be important problems rising from this new data model(s) have to a search engine that may return different results been discussed already within the semi-structured to two users simultaneously, or to Amazon when data research. returning sets of different reviews to two users. In The lessons learned from developing the re- these applications, speed and performance triumph lational database theory have probably laid the the consistency of the results. However, in a bank- methodological foundations for approaching most ing application, two users of the same account need data-related problems, since, however unstructured to see the same balance in their account. A utility and unkempt the datasets at hand, the understand- company needs to display the same “payment due ing developed within the community will ever in- amount” to two or more users perusing an account. form its research strategies. The idea of “eventual consistency” for such applica- tions could lead to chaos in the business world. Is 3.3 Database design it by chance that just those applications that need full consistency are often those that better match “They rent out rooms for old people, the relational structure? Can we imagine a bank, kill’em, bury’em in the yard, cash their a manufacturing or a commercial company which social security checks.” would rather use a complex-object data model to (No Country for Old Men. 2007) represent their data? This is probably why many
SIGMOD Record, June 2013 (Vol. 42, No. 2) 67
people mix up the structure of the relational model sor data in clusters of commodity hardware – that with the ACID properties, which in principle are may make it worthwhile. completely independent aspects. Therefore, we all believe that relational and A consequence of the choices made in some sys- NoSQL database systems will continue to coexist. tems about weak forms of consistency is that the In the era of large, decentralized, distributed en- burden is passed to applications developers, when vironments where the amount of devices and data they need to ensure more sophisticated transaction and their heterogeneity is getting out of control, properties. billions of sensors and devices collect, communicate An observation that has been recently made and create data, while the Web and the social net- about transaction management (and other imple- works are widening the number of data formats and mentation issues) is related to the fact that it can providers. NoSQL databases are most often appro- be easy to omit features, as this simplifies the de- priate for such applications, which either do not re- velopment, but it might be difficult to reintroduce quire ACID properties or need to deal with objects them later. Mohan [6] points out that there were which are clumsily represented in relational terms. experiences in the past with similar simplifications, As a conclusion, NoSQL data storage appears to and it was later very complex to obtain more gen- be additional equipment that business enterprises eral and powerful systems— some features needed may choose to complete their assortment of storage to be rewritten from scratch. services. With all these questions ahead the contribution 5. FINAL COMMENTS the database community can give is huge. Let us take a full breath and start anew! “Look! It’s moving. It’s alive!!” (Frankenstein. 1931) 6. REFERENCES [1] P. Atzeni, F. Bugiotti, and L. Rossi. Uniform In spite of the shortcomings and inadequacies of access to non-relational database systems: the relational model and SQL, these technologies The SOS platform. In CAiSE 2012, Springer, are, however, still going strong. Why? A key rea- pages 160–174, 2012. son is that the systems that implement these are [2] R. Cattell. Scalable SQL and NoSQL data plentiful and have proven their worth. Perhaps stores. SIGMOD Record, 39(4):12–27, 2010. the most important reason is that enormous in- [3] M. Driscoll. SQL is Dead. Long Live SQL! vestments are sitting in applications built on top http://www.dataspora.com/2009/11/ of such systems. Companies around the globe rely sql-is-dead-long-live-sql/, 2009. on these applications and their underlying database [4] G. Grothendieck. SQLDF: SQL select on R management systems for their day-to-day business. data frames. Actually, relational DBMS provide the most under- http://code.google.com/p/sqldf/, 2012. standable format for business application data, and [5] G. Harrison. 10 things you should know about at the same time guarantee the consistency prop- NoSQL databases. http: erties that are needed in business. In addition, the //www.techrepublic.com/blog/10things/ skill sets of their current and prospective employ- 10-things-you-should-know-about-nosql- ees are targeted at these systems. It is not an easy databases/1772, 2010. decision to throw away relational and SQL technol- [6] C. Mohan. History repeats itself: sensible and ogy and instead adopt new technology. Rather, it NonsenSQL aspects of the NoSQL hoopla. In is much easier to extend the current applications EDBT 2013, ACM, pag. 11–16, 2013. and systems with no radical changes. Indeed, to [7] M. Stonebraker. Stonebraker on NoSQL and the extent applications involve standard adminis- enterprises. Commun. ACM, 54:10–11, 2011. trative data and “new” data, relational technology [8] M. Stonebraker and R. Cattell. 10 rules for may even be best suited. scalable performance in ’simple operation’ Thus, when is it reasonable for an organization datastores. Commun. ACM, 54(6):72–80, to bet on a tool that is slightly incompatible with 2011. all the others, may be built by a community in [9] M. Stonebraker, S. Madden, D. J. Abadi, S. open source model, does not grant consistency and Harizopoulos, N. Hachem, and P. Helland. concurrency control and is subject to change, ne- The end of an architectural era: (it’s time for glect, and abandon at any point in time? The point a complete rewrite). In VLDB 2007, VLDB is that there are killer applications – e.g. storing Endowment, pag. 1150-1160, 2007. huge amounts of (read-only) social-network or sen-