You are on page 1of 33

The Art and Business of Software Development

DIGEST August, 2009

Editor’s Note 2
by Jonathan Erickson

Techno-News 3
Bokode: A New Kind of Barcode
Tiny labels could pack lots of information, enable new uses.

Features
3 Steps To Managing Data In the Cloud 5
by Ken North
Matching the right cloud platform with the right database is critical.

Databases in the Cloud: Elysian Fields or Briar Patch? 7


by Ken North
For handling distributed data in the cloud, the cornucopia of products includes
everything from lightweight key-value stores to industrial-strength databases.

Fan: A Portable Language Is Bringing Actors to JavaScript 14


by Gastón Hillar
Fan is both an object-oriented and functional programming language.

The C++0x “Remove Concepts” Decision 15


by Bjarne Stroustrup
“Concepts” were to have been the central new feature in C++0x.

A Build System for Complex Projects: Part 1 19


by Gigi Sayfan
A different approach to build systems.

Integrating jQuery Client-Side Data Templates with WCF 23


by Dan Wahlin
Using client-side templates and binding JSON data that’s retrieved from a WCF service.

Columns
Of Interest 26

Conversations 27
by Jonathan Erickson
Jon talks with MySQL creator Michael “Monty” Widenius about the future of databases.

Book Review 28
by Mike Riley
Mike reviews Using Google App Engine.

Effective Concurrency 29
by Herb Sutter
Herb urges you to design for manycore systems.

Entire contents Copyright© 2009, Techweb/United Business Media LLC, except where otherwise noted. No portion of this publication may be repro-
duced, stored, transmitted in any form, including computer retrieval, without written permission from the publisher. All Rights Reserved. Articles
express the opinion of the author and are not necessarily the opinion of the publisher. Published by Techweb, United Business Media Limited, 600
Harrison St., San Francisco, CA 94107 USA 415-947-6000.
D r. D o b b ’s D i g e s t [ ]
Editor’s Note

Database Development

F
rom e-science visualizations to Wall Street what-ifs, you know there’s a prob-
lem when the talk turns to exabytes. But that problem isn’t so much about too
much data as it is about making sense of the data at hand. In other words, it’s
a question of data management—the architectures, policies, practices, and
procedures you have in place for managing and enhancing a company’s data assets.
What really stands out are the vendors that are providing tools to manage and ana-
lyze what’s referred to as “big data.” There are the usual suspects: Oracle, IBM, Google,
Amazon.com, and FairCom. And then there are upstarts, such as Cloudera and Aster
Data Systems, that are leveraging open source software such as MapReduce and
Hadoop to build new businesses around big data.
Many of the technologies available to manage big data aren’t new. In one form or
another, column-oriented databases, data parallelism, solid-state drives, declarative
By Jonathan Erickson, programming languages, and cloud computing have been around for years. What’s new
Editor In Chief is the emergence of “fringe databases,” or database management systems that are
appearing where you least expect sophisticated data management. For example, med-
ical and consumer devices that once got by with flat files now require powerful data-
base engines to manage the sheer volume of data being collected.
None of this comes without a price. What with big data on the rise, transaction
throughput and concurrency requirements escalating, and data becoming more distrib-
uted, application complexity is increasing. To make it easier to manipulate data, it may
have to be partitioned across multiple files or replicated and synchronized across mul-
tiple sites. And, of course, software developers are looking at complex data schema par-
adigms to accommodate their needs while still maintaining traditional relational
access.
Hey, no one said it was going to be easy.

Return to Table of Contents

DR. DOBB’S DIGEST 2 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t [Techno-News ]
EDITOR-IN-CHIEF
Jonathan Erickson

EDITORIAL
MANAGING EDITOR
Deirdre Blake
COPY EDITOR
Amy Stephens
CONTRIBUTING EDITORS

Bokode: Mike Riley, Herb Sutter


WEBMASTER
Sean Coady

A New Kind of Barcode VICE PRESIDENT, GROUP PUBLISHER


Brandon Friesen
VICE PRESIDENT GROUP SALES
Martha Schwartz
SERVICES MARKETING
COORDINATOR
Laura Robison
Tiny labels could pack lots of information,enable new uses
AUDIENCE DEVELOPMENT
CIRCULATION DIRECTOR
Karen McAleer
MANAGER

T
he ubiquitous barcodes found on The new concept is described in the paper John Slesinski
product packaging provide informa- “Bokode: Imperceptible Visual Tags for Camera-
DR. DOBB’S
tion to the scanner at the checkout based Interaction from a Distance”, written by 600 Harrison Street, 6th Floor, San
counter, but that’s about all they do. Ankit Mohan, Raskar, Grace Woo, Shinsaku Francisco, CA, 94107. 415-947-6000.
www.ddj.com
Now, researchers at the MIT’s Media Lab have Hiura, and Quinn Smithwick.
come up with a new kind of very tiny barcode The tiny labels are just 3 millimeters across UBM LLC
that could provide a variety of useful informa- — about the size of the @ symbol on a typical Pat Nohilly Senior Vice President,
tion to shoppers as they scan the shelves — and computer keyboard. Yet they can contain far Strategic Development and Business
could even lead to new devices for classroom more information than an ordinary barcode: Administration
Marie Myers Senior Vice President,
presentations, business meetings, video games, thousands of bits. Currently they require a lens Manufacturing
or motion-capture systems. and a built-in LED light source, but future ver-
The system, called Bokode (http://web.media sions could be made reflective, similar to the TechWeb
.mit.edu/~ankit/bokode/), is based on a new way holographic images now frequently found on Tony L. Uphoff Chief Executive Officer
of encoding visual information, explains MIT’s credit cards, which would be much cheaper and John Dennehy, CFO
Ramesh Raskar, who leads the lab’s Camera more unobtrusive. David Michael, CIO
John Siefert, Senior Vice President and
Culture group (http://cameraculture.media.mit “We’re trying to make it nearly invisible, Publisher, InformationWeek Business
.edu/). Until now, there have been three but at the same time easy to read with a stan- Technology Network
approaches to communicating data optically: dard camera, even a mobile phone camera,” Bob Evans Senior Vice President and
Content Director, InformationWeek
Mohan says. Global CIO
• Through ordinary imaging (using two- One of the advantages of the new labels is Joseph Braue Senior Vice President,
dimensional space) that unlike today’s barcodes, they can be “read” Light Reading Communications
• Through temporal variations such as a flash- Network
ing light or moving image (using the time from a distance — up to a few meters away. In Scott Vaughan Vice President,
dimension) addition, unlike the laser scanners required to Marketing Services
• Through variations in the wavelength of read today’s labels, these can be read using any John Ecke Vice President, Financial
light (used in fiberoptic systems to provide Technology Network
standard digital camera, such as those now built Beth Rivera Vice President, Human
multiple channels of information simultane- in to about a billion cellphones around the Resources
ously through a single fiber). Jill Thiry Publishing Director
world.
Fritz Nelson Executive Producer,
But the new system uses a whole new The name Bokode comes from the Japanese TechWeb TV
approach, encoding data in the angular dimen- photography term bokeh, which refers to the
sion: Rays of light coming from the new tags round blob produced in an out-of-focus image of
vary in brightness depending on the angle at a light source. The Bokode system uses an out-
which they emerge. “Almost no one seems to of-focus camera — which allows the angle-
have used” this method of encoding informa- encoded information to emerge from the result-
tion, Raskar says. “There have been three ways ing blurred spot — to record the encoded infor-
to encode information optically, and now we mation from the tiny tag. But in addition to
have a new one.” being readable by any ordinary camera (with the

DR. DOBB’S DIGEST 3 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t
focus set to infinity), it can also be read each tag — with an accuracy of a tenth of a
directly by eye, simply by getting very close degree. This is far more accurate than any
— less than an inch away — to the tag. present motion-capture system.
As a replacement for conventional bar- Bokode “could enable a whole new
codes, the Bokode system could have sever- range of applications,” Raskar says. In the
al advantages, Mohan says. It could provide future, they could be used in situations
far more information (such as the complete such as museum exhibit labels, where the
nutrition label from a food product), be tiny codes would be unobtrusive and not

Bokode could also replace RFID systems in some near-field

communication applications

readable from a distance by a shopper scan- detract from the art or other exhibit, but
ning the supermarket shelves, and allow could send a whole host of background
easy product comparisons because several information to viewers through the use of
items near each other on the shelves could their cellphone cameras. Or a restaurant
all be scanned at once. could make its menu available to a passer-
In addition to conventional barcode by on the sidewalk.
applications, the team envisions some new It could also replace RFID systems in
kinds of uses for the new tags. For example, some near-field communication applica-
the tag could be in a tiny keychain-like tions, Mohan suggested. For example,
device held by the user, scanned by a cam- while RFIDs, now used in some ID cards,
era in the front of a room, to allow multiple can provide a great deal of information,
people to interact with a displayed image in that information can be read from a dis-
a classroom or a business presentation. The tance, even when the card is inside a wal-
camera could tell the identity of each per- let. That makes them inappropriate for
son pointing their device at the screen, as credit cards, for example, because the
well as exactly where they each were point- information could be retrieved by an unau-
ing. This could allow everyone in the room thorized observer. But Bokode could
to respond simultaneously to a quiz, and encode just as much information, but
the teacher to know instantly how many require an open line-of-sight to the card to
people, and which ones, got it right — and be read, increasing security.
thus know whether the group was getting The prototype devices produced at the
the point of the lesson. Media Lab currently cost about $5 each,
The devices could also be used for the most of that cost due to use of an off-the-
motion-capture systems used to create shelf convex glass lens, but Raskar says
videogames or computer-generated movie that price could easily drop to 5 cents once
scenes. Typically, video cameras record a they are produced even in volumes of a few
person or object’s motions using colored hundred units.
dots or balls attached to various parts of
the person’s body. The Bokode system
would allow the camera to record very pre-
cisely not just the position but the angle of Return to Table of Contents

DR. DOBB’S DIGEST 4 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

3 Steps
To Managing Data In the Cloud
Matching the right cloud platform with the right database is critical

T
by Ken North he emergence of cloud computing rais- and authorization requirements. At a minimum,
es a host of questions about the best database communications and backups to the
database technology to use with this cloud need to be encrypted.
new model for on-demand computing. Security in cloud environments varies based
Ultimately, the cloud approach a company choos- on whether you use SaaS, a platform provider, or
es determines the data management options that an infrastructure provider. SaaS providers bundle
are available to it. tools, APIs, and services, so you don’t have to
When evaluating the suitability of a database worry about choosing the optimal data store and
manager for cloud computing, there are three security model. But if you create a private cloud
basic steps: or use an infrastructure provider, you’ll have to
select a data management tool that’s consistent
• Consider the class of applications that will be with your app’s security needs. Your database
served: data asset protection, business intelli-
gence, e-commerce, etc. decision also will hinge on whether the environ-
• Determine the suitability of these apps for ment supports a multitenant or multi-instance
public or private clouds. model. Salesforce.com hosts apps on Oracle data-
• Factor in ease of development. bases using multitenancy. Amazon EC2 supports
multi-instance security. If you fire up an Amazon
The database manager you choose should be a Machine Image running Oracle, DB2, or
function of the mission and the applications it Microsoft SQL Server, you have a unique instance
supports, and not based on budgets and whether that doesn’t serve other tenants. You have to
it will run in the enterprise as a private cloud or authorize database users, define roles, and grant
as a public cloud from a service provider. For user privileges when using the infrastructure-as-
instance, some companies turn to a cloud a-service model.
provider to back up mission-critical databases or
as a disaster recovery option. Database-intensive Developers’ Choices
apps such as business intelligence can be Database app development options for public
deployed in the cloud by having a SaaS provider cloud computing can be limited by the providers.
host the data and the app, an infrastructure SaaS offerings such as Google App Engine and
provider host a cloud-based app, or a combina- Force.com provide specific development plat-
tion of these approaches. And popular solutions forms with predefined APIs and data stores.
for processing very large datasets, such as Private cloud and infrastructure providers
Hadoop MapReduce, can run in both the private including GoGrid and Amazon EC2 let users
and public cloud. match the software, database environment, and
Databases, data stores, and data access soft- APIs to their needs. Besides cloud storage APIs,
ware should be evaluated for suitability for both developers can program to various APIs for data
public and private clouds. Public cloud security stores and standard ones for SQL/XML databas-
isn’t adequate for some types of applications. For es. Programmers can work with SQL APIs and
example, Amazon Dynamo was built to operate in APIs for cloud services. For Amazon, that
a trusted environment, without authentication involves using Web Services Description

DR. DOBB’S DIGEST 5 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t
Language and invoking specific web servic-
es. For projects that use the cloud to power
Web 2.0 apps, developers can use
JavaScript Object Notation and the Atom
Publishing protocol.
For new applications hosted in the
cloud, developers look primarily to classes
of data stores such as SQL/XML databases,
column data stores, distributed hash
tables, and tuple spaces variants, such as
in-memory databases, entity-attribute-
value stores, and other non-SQL databases.
Choosing the right data store depends on
the scalability, load balancing, consistency,
data integrity, transaction support, and
security requirements. Some newer data
stores have taken a minimalist approach,
avoiding joins and not implementing
schemas or strong typing; instead, they
store data as strings or blobs. Scalability is
important for very large datasets and has
contributed to the recent enthusiasm for
the distributed hash table and distributed
key-value stores.
One interesting approach is the ability to
configure fault-tolerant systems and hot
backups for disaster recovery. A private
cloud can be configured and operated with
fairly seamless failover to Amazon EC2, for
example. You’ll have to replicate data in the
private and public cloud, implementing the
Amazon APIs and availability zones, as well
as IP assignment and load balancing for the
private cloud. You’ll also have to use server
configurations compatible with Amazon
instances to avoid breaking applications and
services because of changes in endianness,
the Java heap size, and other dissimilarities.
In short, the cloud is an effective elastic
computing and data storage engine, but
matching the right platform with the right
database is critical. Doing this correctly
requires evaluating the job and its security
needs, as well as assessing how easy it is to
design and implement the software.
Carefully weighing these factors will lead
you to the right conclusion.

— Ken North is an author, consultant, and


analyst. He chaired the XML DevCon 200x
conference series, Nextware, LinkedData
Planet, and DataServices World 200x.

Return to Table of Contents

DR. DOBB’S DIGEST 6 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

Databases in the Cloud:


Elysian Fields or Briar Patch?
For handling distributed data in the cloud,the cornucopia of products includes everything from lightweight
key-value stores to industrial-strength databases

C
by Ken North loud computing is the latest sea Cloud (EC2), GoGrid, Rackspace Mosso, and
change affecting how we develop and Joyent, whereas Microsoft Azure, Google
deploy services and applications and AppEngine, Force.com, Zoho, and Facebook are
fulfill the need for persistent informa- platform providers. There are also providers tar-
tion and database solutions. Database technology geting specific classes of cloud users, such as HP
evolves even as new computing models emerge, CloudPrint and IBM LotusLive for collaboration
inevitably raising questions about selecting the services and social networking for businesses.
right database technology to match the new Other SaaS providers include Birst and SAS for
requirements. on-demand business intelligence (BI),
The cloud is an elastic computing and data Salesforce.com and Zoho for customer relation-
storage engine, a virtual network of servers, stor- ship management (CRM), Epicor, NetSuite, SAP
age devices, and other computing resources. It’s a Business ByDesign, and Workday for enterprise
major milestone in on-demand or utility comput- resource planning (ERP) suites. The DaaS
ing, the evolutionary progeny of computer time- providers include EnterpriseDB, FathomDB,
sharing, high-performance networks, and grid Longjump, and TrackVia.
computing. The computer timesharing industry Private clouds, like server consolidation,
that emerged four decades ago pioneered the clusters, and virtualization, are another evolu-
model for on-demand computing and pay-per-use tionary step in data center and grid technology.
resource sharing of storage and applications. Gartner Research predicted government will have
More recently Ian Foster and Carl Kesselman the largest private clouds but any organization
advanced the concept of the grid to make large- having thousands of servers and massive storage
scale computing networks accessible via a service requirements is a likely candidate. Security and
model. Like computer timesharing and the grid, reliability are the appeal of private clouds for
cloud computing often requires persistent storage large enterprises that can afford the infrastruc-
so open source projects and commercial compa- ture. Public cloud computing does not provide
nies have responded with data store and database the 99.99% uptime that enterprise data center
solutions. managers desire for service-level agreements. The
Public clouds include commercial enterprises fact that a private cloud sits behind a firewall mit-
that can host applications and databases, offering igates the risk from exposing data to the cloud.
Software as a Service (Saas), Platform as a Service The private cloud also alleviates concerns about
(PaaS), Infrastructure as a Service (IaaS), and data protection in multitenancy cloud environ-
Database as a Service (DaaS). Infrastructure ments. One issue in the private versus public
providers include Amazon Elastic Compute cloud debate is the diversity of APIs used to

DR. DOBB’S DIGEST 7 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t
invoke cloud services. This has caused has also been used to support e-business demand integration space include Boomi,
interest in creating a standard, but the websites, such as NetFlix.com, eHarmony Cast Iron Systems, Hubspan, Informatica,
Eucalyptus initiative took a different .com, and Target.com. E-mail is a backbone Jitterbit, and Pervasive Software. Business
approach. Assuming the Amazon APIs to of modern business and companies, such as intelligence (BI) activity, such as analytics,
be a de facto standard, it developed private the Boston Celtics, have gone to a cloud data warehousing, and data mining,
cloud software that’s largely compatible computing model for e-mail and collabora- requires horsepower and a capital outlay
with Amazon EC2 APIs. tion software. Companies can also opt to that might be prohibitive for small and
When evaluating the suitability of a use a cloud to host ERP or CRM suites that medium businesses. Cloud computing
database solution for cloud computing, operate with SQL databases, such as open offers an attractive pay-per-use alternative
there are multiple considerations. source ERP suites (Compiere, Openbravo, and there appears to be a large potential BI-
First, you must consider the class of SugarCRM) and BI solutions (Jasper, on-demand market.
applications that will be served: business Pentaho). Because data warehouses use The marriage of cloud computing and
intelligence (BI), e-commerce transactions, source data from operational systems, business intelligence can be accomplished
knowledge bases, collaboration, and so on. organizations using the cloud to host oper- by several means. One option is to have
Second, you must determine suitability for ational databases are likely to do the same data and applications hosted by a SaaS
public and/or private clouds. Third, you for data warehouses and business intelli- provider. Another is to create cloud-based
must consider ease of development. gence. applications hosted by an infrastructure
And, of course, budget is not to be over- On a pay-as-you-go basis, the cloud han- provider. A third alternative is to do both
looked. dles provisioning on demand. Machine and use data replication or a data integra-
images, IP addresses, and disk arrays are tion suite.
Mission: not permanently assigned, but databases
What Will Run In the Cloud? on a public cloud can be assigned to per- PaaS, Public, Private Clouds
Selecting a database manager should be a sistent storage. This saves having to bulk The Platform-as-a-Service (PaaS) solution
function of the mission and applications it load a database each time you fire up bundles developer tools and a data store,
must support, not just budget and whether machine instances and run your applica- but users who opt to use an infrastructure
it will run in the enterprise or a private or tion. But it also puts a premium on data- provider or build a private cloud have to
public cloud. Some organizations use a base security and the cloud provider having match the data store or database to their
cloud provider as a backup for mission- a robust security model for multitenancy application requirements and budget.
critical applications or databases, not as storage. There are open source and commercial
the primary option for deploying applica- The cloud is particularly well-suited for products that have a wide range of capabil-
tions or services. Oracle database users can processing large datasets and compute- ities, from scalable simple data stores to
run backup software that uses Amazon intensive applications that benefit from robust platforms for complex query and
Simplified Storage System (S3) for Oracle parallel processing, such as rendering transaction processing.
database backups. For an even bigger safe- video and data analysis. The early Amazon Databases, data stores, and data access
ty net, organizations can look to cloud com- EC2 users have included biomedical software for cloud computing must be eval-
puting as a disaster recovery option. researchers, pharmaceutical, bioengineer- uated for suitability for both public and pri-
The New York Times project that creat- ing, and banking institutions. They were vate clouds and for the class of applications
ed the TimesMachine, a web-accessible dig- early adopters of grid computing for pur- supported. For example, Amazon Dynamo
ital archive, is a prime example of a one-off poses such as financial modeling, drug dis- was built to operate in a trusted environ-
cloud project requiring massively scalable covery, and other research. Medical ment, without authentication and authori-
computing. But factors besides on-demand research often requires massive simula- zation requirements. Whether the environ-
elasticity come into play when the goal is tions of genetic sequencing and molecular ment supports multitenant or multi-
hosting applications in the cloud on a long- interactions. This has been done by grids, instance applications also influences the
term basis, particularly database applica- often using Basic Local Alignment Search database decision.
tions. Tool (BLAST) programs, and more recently
Cloud users are often looking to deploy by clouds. Researchers have also used Databases and Data Stores
applications and databases with a highly MapReduce software in the cloud for genet- Data management options for the cloud
scalable, on-demand architecture, often on a ic sequence analysis. Eli Lilly uses Amazon include single format data stores, docu-
pay-per-use basis. Common scenarios for EC2 for processing bioinformatics ment databases, column data stores,
using the cloud include startups and proj- sequence information. semantic data stores, federated databases,
ect-based, ad hoc efforts that want to ramp Cloud computing is also used for other and object-relational databases. The latter
up quickly with minimal investment in purposes, such as integrating SaaS and group includes “Swiss Army Knife” servers
infrastructure. But Amazon’s public cloud enterprise systems. Players in the on- from IBM, Microsoft, OpenLink, and

DR. DOBB’S DIGEST 8 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t
Oracle that process SQL tables, XML docu- Application Platform as a pay-per-use serv- value data store that conforms to the mem-
ments, RDF triples, and user-defined types. ice on Amazon EC2. It includes a local and cached protocol but uses Berkeley DB for
Building a petabyte size web search distributed Jini transaction manager, Java data persistence.
index is a very different problem from pro- Transaction API (JTA), JDBC support, with Scalaris is a distributed key-value store,
cessing an order or mapping wireless net- b-tree and hash-based indexing capabili- implemented in Erlang, which has a non-
works. The requirements of the application ties. Amazon SimpleDB also provides stan- blocking commit protocol for transactions.
and data store for those tasks are quite dif- dard tuple spaces interfaces, but adds sec- Using the web interface, you can read or
ferent. For new applications hosted in the ondary indexing and support for additional write a key-value pair, with each operation
cloud, developers will look primarily to query operators. being an atomic transaction. Using Java,
several classes of data store: For large datasets and databases, parti- you can execute more complex transac-
tioning data has been a facilitator of paral- tions. Scalaris has strong consistency and
• SQL/XML (object-relational) databases lel query processing and load balancing. supports symmetric replication, but does
• Column data stores
• Distributed hash table (DHT), simple Horizontal partitioning, referred to as not have persistent storage.
key-value stores sharding, has caught the attention of devel- The open source Tokyo Cabinet data-
• Tuple spaces variants, in-memory data- opers looking to build multiterabyte cloud base library is causing a buzz in online dis-
bases, entity-attribute-value stores, and databases because of its success at cussions about key-value stores. It’s blaz-
other non-SQL databases having fea- Amazon, Digg, eBay, Facebook, Flickr, ingly fast, capable of storing one million
tures such as filtering, sorting, range
queries, and transactions. Friendster, Skype, and YouTube. records in 0.7 seconds using the hash table
SQLAlchemy and Hibernate Shards, engine and 1.6 seconds using the b-tree
object-relational mappers for Python and engine. The data model is one value per key
Because this cornucopia of data stores Java, respectively, provide sharding that’s and it supports LZW compression. When
has diverse capabilities, it’s important to useful for cloud database design. Google keys are ordered, it can do prefix and range
understand application requirements for developed Hibernate Shards for data clus- matching. For handling transactions, it fea-
scalability, load balancing, consistency, ters before donating it to the Hibernate tures write ahead logging and shadow pag-
data integrity, transaction support, and project. You can do manual sharding for a ing. Tokyo Tyrant is a database server ver-
security. Some newer data stores are an platform such as Google AppEngine, use sion of Tokyo Cabinet that’s been used to
exercise in minimalism. They avoid joins SQLAlchemy or Hibernate Shards for cache large SQL databases for high-volume
and don’t implement schemas or strong Python or Java development, or use a cloud applications.
typing, instead storing data as strings or data store such as MongoDB that provides Some products of this group support
blobs. Scalability with very large dataset administrative commands for creating queries over ranges of keys, but ad hoc
operations is a requirement for cloud com- shards. query operations and aggregate operations
puting, which has contributed to the recent (sum, average, grouping) require program-
enthusiasm for the DHT and distributed Distributed Hash Table, ming because they are not built-in.
key-value stores. Key-Value Data Stores
Associative arrays, dictionaries, hash Distributed hash tables and key-value Hadoop MapReduce
tables, rings, and tuple spaces have been stores are tools for building scalable, load Hadoop MapReduce would be a nominee
around for years, as have entity-attribute- balanced applications, not for enforcing for the Academy Award for parallel pro-
value (EAV) stores, database partitions and rigid data integrity, consistency, and Atomic cessing of very large datasets, if one exist-
federated databases. But cloud computing Consistent Isolated Durable (ACID) prop- ed. It’s fault-tolerant and has developed a
puts an emphasis on scalability and load erties for transactions. They have limited strong following in the grid and cloud com-
balancing by distributing data across multi- applicability for applications doing ad hoc puting communities, including developers
ple servers. The need for low-latency data query and complex analytics processing. at Google, Yahoo, Microsoft, and Facebook.
stores has created an Internet buzz about Products in this group include memcached, Open source Hadoop is available from
key-value stores, distributed hash tables MemcacheDB, Project Voldemort, Scalaris, Apache, a commercial version is available
(DHT), entity-attribute-value stores, and and Tokyo Cabinet. Memcached is ubiqui- from CloudEra, and Amazon offers an
data distribution by sharding. tous and a popular solution for caching Elastic MapReduce service based on
Tuple spaces are a solution for distrib- database-powered web sites. It’s a big asso- Hadoop.
uted shared memory that originated with ciative array that’s accessed with a get or put MapReduce operates over the Hadoop
the Linda effort at Yale that spawned more function, using the key that’s a unique iden- Distributed File System (HDFS), with file
than 20 implementations, including tifier for data. It’s particularly useful for splits and data stored as key value pairs.
Object Spaces, JavaSpaces, GigaSpaces, caching information produced by expensive The HDFS enables partitioning data for
LinuxTuples, IBM TSpaces, and PyLinda. SQL queries, such as counts and aggregate multiple machines to do parallel processing
One can find GigaSpaces eXtreme values. MemcacheDB is a distributed key- of batches and reduce processing time.

DR. DOBB’S DIGEST 9 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t
MapReduce is suitable for processing very reading, supports atomic transactions and Microsoft supporting SOAP and REST.
large datasets for purposes such as build- consistency, and stores entities with prop- Microsoft Azure bundles SQL Data
ing search index engines or data mining, erties. It permits filtering and sorting on Services (SDS) and exposes Azure Table
but not for online applications requiring keys and properties. It has 21 built-in data Storage via ADO.NET Data Services. The
sub-second response times. Frameworks types, including list, blob, postal address, database Azure currently offers is a sin-
built on top of Hadoop, such as Hive and and geographical point. Applications can gle instance of SQL Server that’s limited
Pig, are useful for extracting information define entity groupings as the basis for per- to 10 gigabytes of storage. For a larger
from databases for Hadoop processing. The forming transactional updates and use requirement it’s necessary to partition
eHarmony.com site is an example of the GQL, a SQL-like query language. Access to data to scale horizontally.
marriage of an Oracle database and the Google AppEngine data store is pro- For those with a history of using indus-
Amazon MapReduce, using the latter for grammable using Python interfaces for trial-strength databases, a big adjustment
analytics involving millions of users. queries over objects known as entities. The to the new EAV stores is lack of strong typ-
data store is also programmable using Java ing. SimpleDB uses string values to store
Entity-Attribute-Value Data Objects (JDO) and the Java Persistence everything, so comparisons and sorting
Datastores API. Although AppEngine bundles a data require that you pad numbers with leading
EAV stores are derived from data manage- store, the AppScale project provides soft- zeros. Microsoft SQL Data Services pro-
ment technology that predates the relation- ware for operating with data stores such as vides Base64, Boolean, datetime decimal,
al model for data. They do not have the full HBase, Hypertable, MongoDB, and MySQL. and string. With more than 20 types,
feature set of an SQL DBMS, such as a rich Google AppEngine has more built-in types
query model based on a nonprocedural, Amazon Platform than SimpleDB or SQL Data Services.
declarative query language. But they are Amazon SimpleDB is a schemaless, Erlang-
more than a simple key-value data store. based, eventually consistent data store suit- RDF and
EAV data stores from major cloud comput- ed for high-availability applications. The Semantic Data Stores
ing providers include Amazon SimpleDB, data model provides domains of large col- Social networking and e-commerce have
Google AppEngine data store, and lections of items, which are hash tables shown us there are classes of web applica-
Microsoft SQL Data Services. And one containing attributes that are key-value tions that must operate with massive data
type, the RDF datastore used for knowl- pairs. Attributes can have multiple values stores and support a user base measured in
edge bases and ontology projects, has been and there are no joins. The query language millions. Cloud computing is often touted
deployed in the cloud. provides queries that can return an as a vehicle for scaling out that type of site
Google Bigtable uses a distributed file itemName, all attributes, the attribute and powering Web 3.0 applications. Tim
system and it can store very large datasets count, or an attribute list. Data is stored in Berners-Lee has said a web of linked data
(petabyte size) on thousands of servers. It’s a single format (untyped strings), without will evolve from the web of linked docu-
the underlying technology for the Google applying constraints, so all predicate com- ments. This has produced a surge of inter-
AppEngine data store. Google uses it, in parisons are lexicographical. Therefore, for est in data stores that can handle very large
combination with MapReduce, for indexing accurate query results you must store data knowledge bases and datasets encoded to
the Web and for applications such as in an ordered format; for example, padding impart semantics using the W3C Resource
Google Earth. Bigtable is a solution for numbers with leading zeroes and using Description Format (RDF) and in the W3C
projects that require analyzing a large col- dates in ISO 8601:2004 format. SPARQL query language.
lection, for example the one billion web Interest in RDF, micro formats, and
pages and 4.78 billion URLs in the Azure Services Platform linked data has raised awareness of the
ClueWeb09 dataset from Carnegie Mellon Microsoft Azure, like Google AppEngine capabilities and capacity of RDF data
University. For those seeking an open and Force.com, offers a platform for stores. Because there are a number of RDF
source alternative to Bigtable for use with cloud computing that includes a data data stores, the benchmark wars are remi-
Hadoop, Hypertable and HBase have devel- store and other features for application niscent of the Transaction Processing
oped a following. Hypertable runs on top of development. Microsoft .NET Services Council (TPC) benchmark competition
a distributed filesystem, such as HDFS. provide a service bus and authentication among SQL vendors. RDF data is stored as
HBase data is organized by table, row, and and Live Services are application build- subject-predicate-object triples. The lead-
multivalued columns and there’s an integra- ing blocks. Microsoft also offers ing RDF data stores often store additional
tor-style interface for scanning a range of SharePoint Services and Dynamics CRM information for versioning and temporal
rows. Hypertable is implemented in C++, Services in the Azure cloud. Like queries, but they are capable of storing
whereas HBase is implemented in Java. Amazon S3 and EC2, communication and querying over billions of triples. A
The Google AppEngine includes a using the Azure Services Platform is W3C wiki identifies more than a dozen
schemaless data store that’s optimized for based on the web services model, with triple stores, about half citing deploy-

DR. DOBB’S DIGEST 10 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t
ments or benchmarks with one billion users, with a large AMI instance, drivers instead of procedural programming, the
triples or more. for ODBC, JDBC, Python, and Ruby, and a SQL solution offers a declarative program-
Sesame, Jena, and Mulgara are popular database size of 1 terabyte per node as you ming solution that relies on the query opti-
open source solutions. OpenLink Virtuoso is scaleout to multiple nodes. mizer to generate the access path to the
a universal server that in a recent bench- Apache CouchDB is a schema-less, data. The SQL platforms offer mature
mark loaded 110,500 triples per second. fault-tolerant data store that appeals to administrative tools and standards-based
The Virtuoso Universal Server (Cloud developers building HTTP applications for connectivity, but the highest capacity SQL
Edition) is a prepackaged AMI for EC2. In which a document paradigm is useful. It configurations have not yet been seen in
addition to SQL and XML databases, it pro- supports the JavaScript Object Notation the pay-per-use cloud. IBM DB2 gives you a
vides online backup to Amazon S3 buckets (JSON) and AtomPub data formats and it hybrid storage engine that supports trans-
and installable RDFizer cartridges. Franz provides a REST-style API for reading and action processing, business intelligence,
AllegroGraph RDFStore offers a vehicle for updating named documents. To ensure and XML document processing. It current-
building RDF-based federated knowledge data consistency, it has ACID properties ly holds several TPC benchmark records,
stores in the cloud. It supports SPARQL and does not overwrite committed data. It including one million TPC-C transactions
queries, Prolog, and RDFS++ reasoning. On uses a document ID and sequence number per minute on an 8 processor/64 core clus-
Amazon EC2, it stored and indexed a 10-bil- to write a b-tree index, with sequence num- ter running Red Hat Linux Advanced
lion triple dataset in 6.19 hours using 10 ber providing the ability to generate a doc- Server. But the ready-to-run Amazon EC2
large EC2instances. The SQL/XML products ument’s version history. CouchDB provides AMIs aren’t configured for that type of
can store RDF triples, including Oracle11g, a peer-based distributed database solution workload. The AMI’s bundles are for run-
IBM Boca for DB2. On the patent front, that supports bidirectional replication. ning IBM DB2 Express Edition or
Microsoft has been active with applications Workgroup Edition and Informix Dynamic
for methods to store RDF triplets and con- SQL/XML Databases Server Express Edition and Workgroup
vert SPARQL queries to SQL. The SQL database has survived every para- Edition. For heavier lifting, you’ll need to
digm shift critics said would be the death of move your own DB2 Enterprise Edition or
Document Stores, SQL, including object-oriented program- Informix Dynamic Server (IDS) licenses to
Column Stores ming (OOP), online analytical processing EC2. Besides DB2 and Informix Dynamic
Storing data by column, rather than the (OLAP), Internet computing, and the Server, there are prepackaged AMIs for
row-orientation of the traditional SQL plat- World Wide Web. Some have suggested IBM Lotus Web Content Management and
form, does not subvert the relational SQL platforms are not sufficiently scala- Web Sphere sMash. For DB2 or IDS devel-
model. But when combined with data com- ble for large workloads or data volumes, opment, IBM provides Developer AMIs for
pression and a shared-nothing, massively but there’s ample evidence to the contrary. EC2 that have no DB2 or IDS usage charge.
parallel processing (MPP) architecture, it The UPS shipping system central database Oracle users can transfer licenses to
can sustain high-performance applications processes 59 million transactions per EC2 for the Oracle 11g database,Fusion
doing analytics and business intelligence hour. It has a table that contains more Middleware, and Enterprise Manager. The
processing. By using a Sybase IQ or Vertica than 42 billion rows and has achieved a company also provides ready-to-run AMIs
column store with a cloud computing serv- peak workload of more than 1 billion SQL and the Oracle Secure Backup Cloud
ice, organizations can roll their own scala- statements per hour with IBM DB2. The Module can create compressed and
ble BI solutions without a heavy capital data warehouse at eBay, running on a encrypted database backups using Amazon
outlay for server hardware. Sybase IQ Teradata system, contains 5 petabytes of S3. The S3 backups easily integrate with
processes complex analytics queries, accel- data. LGR Telecommunications derives Oracle Recovery Manager using its SBT
erates report processing and includes a information from call records to feed a 310 interface. The Oracle EC2 AMIs are pre-
word index for string processing, such as TB Oracle data warehouse. At a recent con- configured to use Enterprise Linux. The
SQL LIKE queries. It provides connectivity ference, Microsoft reported Hotmail has selection includes Oracle Database 10g
via standard data access APIs and its Rcube 300 million users and processes more than Express Edition, Oracle Database 11g
schemas provide a performance advantage 2 billion non-spam messages per day with Enterprise Edition, Oracle Database 11g
over the star schema typically used for rela- Microsoft SQL Server running on a 10,000 SE, and WebLogic Server 10g. Oracle’s
tional data warehouses and data marts. server farm. licensing policy permits moving Fusion
Vertica Analytic Database is a solution from The SQL/XML database platforms pro- Middleware to EC2, including WebLogic
a company co-founded by Michael vide a rich query model, supporting SQL, Server, JRockit (Java VM), Coherence, and
Stonebreaker. Vertica supports a grid archi- XQuery, XPath expressions, and SPARQL the Tuxedo transaction processing monitor.
tecture, terabyte-sized databases, and stan- queries. Typically, a key-value store Oracle Coherence is an in-memory,
dards-based connectivity. It makes pay-as- requires logic in the application to perform distributed data grid that stores key-value
you-go analytics available to Amazon EC2 record-oriented query processing. But pairs. It provides linear scalability

DR. DOBB’S DIGEST 11 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t
(reportedly deployed ina 5000-node Amazon S3 as the persistence solution to Amazon EC2 to run Monte Carlo simula-
grid), replication, caching, and transpar- the ephemeral disk storage problem when tions.
ent fail over for distributed data. trying to manage databases using EC2 Quetzall CloudCache is targeted at
Coherence supports analysis and aggrega- instances. cloud applications hosted on EC2.
tion over the entire grid and it’s available CloudCache returns data in JSON or XML
for C++, Java, and .NET development. In-Memory Databases, Cache format and it can run in multiple EC2
Oracle Real Application Clusters are not For applications that require extremely regions. It offers a REST-style API and there
currently available on a public cloud high throughput, in-memory databases are bindings for Ruby, Python, PHP, and
provider. and caches can be deployed to deliver Java development. Microsoft is currently
previewing a distributed, in-memory cache
code-named Velocity. It supports retrieving
data by key or tag, optimistic and pes-
simistic concurrency, and it includes an
ASP.NET session provider.
To improve responsiveness, high-volume web sites typically use
Federated Data
The federated database provides a solution
cache to reduce the number of queries against SQL databases when data is distributed because volume,
workload, or other considerations make it
impractical to combine it into a single data-
base. Open SkyQuery and Flickr have been
showcases for federation.
MySQL Enterprise is a platform suitable performance. One solution is to pair an in- SkyQuery runs distributed queries over
for cloud computing scenarios, such as scal- memory database with a disk-resident SQL federated astronomical data sources. Flickr
ing out with multiple servers and using database, with the former acting as a cache uses sharding to support billions of queries
master/slave replication. Some MySQL for the latter. Times Ten and solidDB are per day over federated MySQL databases
users have created a high-availability solu- robust in-memory products that were used for data management of 2 billion pho-
tion for Amazon EC2 by using a multi- acquired by Oracle and IBM, respectively. tos. That type of success and the scalability
instance master-master replication cluster. Oracle Times Ten is an embeddable, in- requirements of cloud computing have put
MySQL Enterprise subscribers can sign up memory database that supports ODBC and new emphasis on federated data and shard-
for 24x7 support services for EC2, with dif- JDBC data access. It can provide real-time ing. Mergers and acquisitions also may
ferent levels of support available from Sun. caching for and automatic synchronization force the creation of federated data stores
With the Platinum subscription, you get an with Oracle 11g databases. IBM solidDB to permit execution of business intelli-
enterprise dashboard, replication monitor, maintains redundant copies of the database gence and other queries against disparate
connectors, caching (memcached) and par- at all times and provides JDBC and ODBC CRM databases.
titioning with MySQL Advanced. query capabilities. It can scale using parti- IBM has been using GaianDB, based on
Continuent and Sun are working on making tioning for instances, act as cache for SQL Apache Derby, to test performance of a
MySQL clustering technology available on databases, and do periodic snapshots lightweight federated database engine. It
cloud computing services such as GoGrid, (checkpoints to disk). The GigaSpaces XAP distributed the database over 1000 nodes,
Rackspace Mosso, and Amazon EC2. builds on the tuple spaces shared memory which GaianDB was able to query in 1/8 of
EnterpriseDB Postgres Plus Cloud model and offers a JDBC capability for SQL a second. Fetching a million rows took five
Edition is a version of PostgreSQL with queries. seconds.
enhancements such as GridSQL, replica- To improve responsiveness, high-volume
tion, asynchronous pre-fetch for RAID, and web sites typically use cache to reduce the Platform
a distributed memory cache. GridSQL uses number of queries against SQL databases. And API Issues
a shared-nothing architecture to support Ehcache is a distributed Java cache used by Database options for public cloud comput-
parallel query processing for highly scala- LinkedIn. The memcached server provides ing can be limited by the choice of cloud
ble environments such as grids and clouds. a distributed object cache often used for provider. SaaS providers, such as Google
For its cloud edition, EnterpriseDB part- MySQL applications. JBoss Cache has been AppEngine and Force.com, offer a specific
nered with Elastra, which had a SaaS offer- integrated with GridGain in the Open platform for development, including prede-
ing with PostgreSQL and MySQL on Cloud Platform. GridDynamics demon- fined APIs and data stores. But private
Amazon and a product for management of strated linear scalability with the GridGain clouds and infrastructure providers, such
clustered data warehouses. Elastra used platform from 2 to 512 nodes using as GoGrid, Joyent, and Amazon EC2,

DR. DOBB’S DIGEST 12 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t
enable the cloud user to match the soft- ty, Amazon EC2 provides for the defini- Final Thoughts
ware, database environment, and APIs to tion of security groups. But you must use The SQL database became predominant
requirements. an Amazon API function to manually even though an earlier generation of data-
Besides cloud storage APIs, developers monitor the security group descriptions. bases delivered ACID properties and
can program to diverse APIs for data And there is no logging function to mon- excellent performance on Create Replace
stores and standards-based APIs for itor failed attempts at authentication. Update Delete (CRUD) operations. They,
SQL/XML databases. The programmer There are differences in security like some of the software mentioned here,
developing applications for the cloud can depending on whether you use SaaS, a required a programmer to write code to
work with SQL APIs and APIs for cloud platform provider, or infrastructure navigate through data in order to perform
services. For Amazon, that involves using provider. Because SaaS providers offer a queries, such as an aggregation query. But
Web Services Description Language bundle with tools, APIs and services, the SQL platforms provided an ad hoc query
(WSDL) and invoking specific web servic- SaaS user is not caught up in choosing the solution that did not require procedural
es. For projects that use the cloud to optimal data store and security model. programming because it used a declara-
power rich Internet applications However, those creating private clouds or tive query language and provided built-in
(Web2.0), developers might be looking to using an infrastructure provider must aggregation functions. The logic that
use JavaScript Object Notation (JSON) select a data management solution that’s must be programmed in an application or
and the Atom Publishing protocol consistent with the application’s security service, versus built-in with the database
(AtomPub). More than one guru considers requirement. engine, is an element of the total cost of
AtomPub to be the de facto standard for Salesforce.com hosts applications on ownership (TCO) of a data store solution.
accessing cloud-based data. Oracle databases using a multitenancy The direction an organization takes on
Ease of development is an important model. On the other hand, Amazon EC2 cloud computing, whether to go the private
aspect for a cloud database solution, with is an example of multi-instance security. cloud route or use a public cloud, will
application programming interfaces (API) If you fire up an AMI running Oracle, determine what options are available for
being a major factor. Some data access pro- DB2 or Microsoft SQL Server, you have a data management. For those who walk the
gramming for the cloud can be done with unique instance that does not serve other PaaS path, the focus will be on the plat-
familiar APIs, such as Open Database tenants. The process of authorizing data- form’s capabilities, not the data store per
Connectivity (ODBC), JDBC, Java Data base users, defining roles, and granting se. Those who walk the private cloud or
Objects (JDO), and ADO.NET. privileges is your responsibility when IaaS paths will have to choose a hardware
using IaaS. and software configuration, including a
Security data store that fits the business goals and
For certain classes of applications, securi- Fault-Tolerance requirements of applications running in the
ty is an obstacle to using public cloud serv- And Cloud Fail Over cloud. Many factors will influence the
ices, but it’s not insurmountable. Current One of the exciting possibilities introduced choice of one or more of a spectrum of
thinking on the subject emphasizes by cloud service providers is being able to cloud database solutions, ranging from
encryption, authorization, authentication, configure fault-tolerant, highly available simple data stores to platforms that sup-
digital certificates, roles, and policy-based systems and hot backups for disaster recov- port complex queries and transaction pro-
security controls. Database backups to the ery. It’s possible to configure and operate a cessing.
cloud can be encrypted. Communications private cloud for a fairly seamless fail over Not every project requires the full func-
can use secure networking and encrypted to Amazon EC2, for example. It would tionality of the SQL database managers so
data. require replicating data in the private and there’s a definite need for lightweight, fast,
Java and .NET offer robust crypto- public cloud, implementing the Amazon scalable data stores.
graphic solutions for applications and APIs and availability zones, IP assignment
services accessing databases. Operating and load balancing for the private cloud, —Ken North is an author, consultant
systems and robust SQL databases offer and using server configurations compatible and industry analyst. He teaches seminars
additional layers of security. SQL data- with Amazon instances. The latter would and chaired the XML Devcon 200x confer-
bases provide features such as row-level be necessary to avoid breaking applications ence series, Nextware, LinkedData Planet
encryption and role-based assignment of or services due to changes in endianness, and DataServices World 200x conferences.
privileges and access to data. But even the Java heap size, and other dissimilari-
with multilevel security, one serious ties. A recent dialogue with IBM about the
threat to databases in the public cloud inverse scenario, deploying databases in a
and the corporate data center is a breach public cloud and moving them to a private
of hypervisor security by an authorized cloud, revealed this would be more of a
employee. In order to ensure data securi- challenge. Return to Table of Contents

DR. DOBB’S DIGEST 13 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

Fan: A Portable Language


Is Bringing Actors to JavaScript
Fan is both an object-oriented and functional programming language

F
by Gastón Hillar an (http://www.fandev.org/), another new the possibility to tackle multicore, developers can
programming language developed in the transform a slow performance language into a fast
multicore era, has recently launched its one. Undoubtedly, JavaScript’s support is a great
1.0.45 release (http://code.google.com/p/ opportunity for developers to create higher per-
fan/downloads/list). It is a very active open source formance RIAs (Rich Internet Applications) based
project with an interesting approach to many mod- on this popular scripting language.
ern concurrent programming challenges. It’s a bit difficult to find a definition for Fan,
I began writing about Fan 1.0.44 a week ago. because it tries to offer many different features in
Now, Fan has a new version, 1.0.45. one single and simple language. In a few lines,
Most developers don’t want to learn a new this list offers a summary of Fan’s main features:
programming language. However, Fan is an
attractive language for certain tasks because it is • Portability
• Object-oriented (with inheritance support)
trying to solve modern problems related to con- • Immutability
currency and multicore programming. • Closures support
Fan is both an object-oriented and functional • Dynamic programming
programming language. This means that a develop- • Functional programming
er can combine functional programming code with • Serialization support
• Actor framework
object-oriented code. However, at the same time, it
has built-in immutability, message passing, and The actor framework is really powerful. It
REST-oriented transactional memory. It uses Java- supports the most important features required to
style syntax. Therefore, Java and C# developers create concurrent code without problems:
won’t have problems understanding Fan code.
Its portability makes Fan unique. So far, it can • Actor locals
• Actor pools (using a shared thread pool)
run on the JVM (Java Virtual Machine), the .NET • Futures
CLR (Common Language Runtime), and • Timers
JavaScript. Its JavaScript support is one of the • Chaining
most exciting features I’ve found for this lan- • Message passing
guage. There are many other languages and • Coalescing messages
• Flow control mechanisms
libraries offering actors and many different con-
currency models for the JVM and the .NET CLR. If you need functional programming,
However, Fan’s support for JavaScript could revo- immutability, message passing, and actors, you
lution the scripting performance. Scripting could should take a look at Fan. Likewise, if you are
take advantage of multicore. working with JavaScript, keep an eye on Fan. It
In fact, scripting should take advantage of can help developers to tackle multicore.
multicore. Fan is evolving to offer JavaScript In the forthcoming months, expect to see new
developers the possibility to tackle concurrency libraries, languages, compilers, and Domain-
using actors and message-passing features. Specific Languages appearing to simplify parallel
Besides, you can also run it on the JVM or on the programming for many languages, virtual
.NET CLR. However, you can expect Fan to offer machines, and runtimes.
additional compilers to run on new platforms. — Gastón Hillar is the author of C# 2008 and
Portability is very important for Fan. 2005 Threaded Programming: Beginner’s Guide.
Fan creators talk about the productivity of
Ruby with the performance of Java. As Fan offers Return to Table of Contents

DR. DOBB’S DIGEST 14 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

The C++0x
“Remove Concepts” Decision
“Concepts”were to have been the central new feature in C++0x

A
By Bjarne Stroustrup t the July 2009 meeting in Frankfurt, For example, see (listed in chronological
Germany, the C++ Standards order):
Committee voted to remove “con-
cepts” from C++0x. Although this • Bjarne Stroustrup and Gabriel Dos Reis:
“Concepts — Design choices for template
was a big disappointment for those of us who argument checking.” October 2003. An early
have worked on concepts for years and are aware discussion of design criteria for “concepts”
of their potential, the removal fortunately will not for C++.
directly affect most C++ programmers. C++0x • Bjarne Stroustrup: “Concept checking — A
will still be a significantly more expressive and more abstract complement to type checking.”
October 2003. A discussion of models of
effective language for real-world software devel- “concept” checking.
opment than C++98. The committee acted with • Bjarne Stroustrup and Gabriel Dos Reis: “A
the intent to limit risk and preserve schedule. concept design (Rev. 1).” April 2005. An
Maybe a significantly improved version of “con- attempt to synthesize a “concept” design
cepts” will be available in five years. This article based on (among other sources) N1510,
N1522, and N1536.
explains the reasons for the removal of “con- • Jeremy Siek, Douglas Gregor, Ronald Garcia,
cepts,” briefly outlines the controversy and fears Jeremiah Willcock, Jaakko Jarvi, and Andrew
that caused the committee to decide the way it Lumsdaine: “Concepts for C++0x.”
did, gives references for people who would like to N1758==05-0018. May 2005.
explore “concepts,” and points out that (despite • Gabriel Dos Reis and Bjarne Stroustrup:
“Specifying C++ Concepts.” POPL06. January
enthusiastic rumors to the contrary) “the sky is 2006.
not falling” on C++. • D. Gregor, B. Stroustrup: “Concepts.”
N2042==06-0012. June 2006. The basis for
No “Concepts” in C++0x all further “concepts” work for C++0x.
At the July 2009 Frankfurt meeting of the ISO • Douglas Gregor, Jaakko Jarvi, Jeremy Siek,
Bjarne Stroustrup, Gabriel Dos Reis, Andrew
C++ Standards Committee (WG21) Lumsdaine: “Concepts: Linguistic Support for
(http://www.open-std.org/jtc1/sc22/wg21/), the Generic Programming in C++.” OOPSLA’06,
“concepts” mechanism for specifying require- October 2006. An academic paper on the
ments for template arguments was “decoupled” C++0x design and its experimental compiler
(my less-diplomatic phrase was “yanked out”). “ConceptGCC.”
• Pre-Frankfurt working paper (with “con-
That is, “concepts” will not be in C++0x or its cepts” in the language and standard library):
standard library. That — in my opinion — is a http://www.open-std.org/jtc1/sc22/wg21/
major setback for C++, but not a disaster; and docs/papers/2009/n2914.pdf. N2914=09-
some alternatives were even worse. 0104. June 2009.
I have worked on “concepts” for more than • B. Stroustrup: “Simplifying the use of con-
cepts.” N2906=09-0096. June 2009.
seven years and looked at the problems they
aim to solve much longer than that. Many It need not be emphasized that I and others
have worked on “concepts” for almost as long. are quite disappointed. The fact that some

DR. DOBB’S DIGEST 15 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t
alternatives are worse is cold comfort grammers.” That concern was shared by • Fix and ship
and I can offer no quick and easy reme- several members. The stated aim of “con- Remaining work: remove explicit
“concepts,” add explicit refinement,
dies. cepts” was to make generic programming
add “concept”/type matching, handle
Please note that the C++0x improve- more accessible to most programmers “concept” map scope problems
ments to the C++ features that most pro- [BS&GDR2003], but that aim seemed to Risks: no implementation, complexity
grammers see and directly use are unaffect- me to have been seriously compromised: of description
ed. C++0x will still be a more expressive Rather than making generic programming Schedule: no change or one meeting
• Yank and ship
language than C++98, with support for con- more accessible, “concepts” were becoming
Remaining work: yank (core and stan-
current programming, a better standard yet another tool in the hands of experts dard library)
library, and many improvements that make (only). Over the last half year or so, I had Risks: old template problems remain,
it significantly easier to write good (i.e., been examining C++0x from a user’s point disappointment in “progressive” com-
efficient and maintainable) code. In partic- of view, and I worried that even use of munity (“seven years of work down
the drain”)
ular, every example I have ever given of libraries implemented using “concepts”
Schedule: five years to “concepts”
C++0x code (e.g., in “Evolving a language would put new burdens on programmers. I (complete redesign needed) or never
in and for the real world: C++ 1991-2006” felt that the design of “concepts” and its use • Status quo
at ACM HOPL-III, available at http://portal in the standard library did not adequately Remaining work: details
.acm.org/toc.cfm?id=1238844) that does reflect our experience with “concepts” over Risks: unacceptable programming
model, complexity of description
not use the keywords “concept” or the last few years.
(alternative view: none)
“requires” is unaffected. See also my Then, a few months ago, Alisdair Schedule: no change
C++0x FAQ (http://www.research.att.com/ Meredith (an insightful committee member
~bs/C++0xFAQ.html). Some people even from the UK) and Howard Hinnant (the I and others preferred the first alterna-
rejoice that C++0x will now be a simpler head of the standard library working tive (“fix and ship”) and considered it fea-
language than they had expected. group) asked some good questions relating sible. However, a large majority of the com-
“Concepts” were to have been the cen- to who should directly use which parts of mittee disagreed and chose the second
tral new feature in C++0x for putting the the “concepts” facilities and how. That led alternative (“yank and ship,” renaming it
use of templates on a better theoretical to a discussion of usability involving many “decoupling”). In my opinion, both are bet-
basis, for firming-up the specification of people with a variety of concerns and ter than the third alternative (“status
the standard library, and a central part of points of view; and I eventually — after quo”). My interpretation of that vote is that
the drive to make generic programming much confused discussion — published my given the disagreement among proponents
more accessible for mainstream use. For conclusions [BS2009]. of “concepts,” the whole idea seemed con-
now, people will have to use “concepts” To summarize and somewhat oversim- troversial to some, some were already wor-
without direct language support as a design plify, I stated that: ried about the ambitious schedule for
technique. My best scenario for the future C++0x (and, unfairly IMO, blamed “con-
is that we get something better than the • “Concepts” as currently defined are cepts”), and some were never enthusiastic
current “concept” design into C++ in about too hard to use and will lead to disuse about “concepts.” Given that, “fixing con-
of “concepts,” possibly disuse of tem-
five years. Getting that will take some seri- plates, and possibly to lack of adoption cepts” ceased to be a realistic option.
ous focused work by several people (but of C++0x. Essentially, all expressed support for “con-
not “design by committee”). • A small set of simplifications [BS2009] cepts,” just “later” and “eventually.” I
can render “concepts” good-enough-to- warned that a long delay was inevitable if
What Happened? ship on the current schedule for C++0x we removed “concepts” now because in the
or with only a minor slip.
“Concepts,” as developed over the last absence of schedule pressures, essentially
many years and accepted into the C++0x That’s pretty strong stuff. Please remem- all design decisions will be reevaluated.
working paper in 2008, involved some ber that standards committee discussions Surprisingly (maybe), there were no
technical compromises (which is natural are typically quite polite, and since we are technical presentations and discussions
and necessary). The experimental imple- aiming for consensus, we tend to avoid about “concepts” in Frankfurt. The discus-
mentation was sufficient to test the “con- direct confrontation. Unfortunately, the sion focused on timing and my impression
ceptualized” standard library, but was not resulting further (internal) discussion was is that the vote was decided primarily on
production quality. The latter worried massive (hundreds of more and less timing concerns.
some people, but I personally considered it detailed messages) and confused. No agree- Please don’t condemn the committee for
sufficient as a proof of concept. ment emerged on what problems (if any) being cautious. This was not a “Bjarne vs.
My concern was with the design of “con- needed to be addressed or how. This led me the committee fight,” but a discussion try-
cepts” and in particular with the usability of to order the alternatives for a presentation ing to balance a multitude of serious con-
“concepts” in the hands of “average pro- in Frankfurt: cerns. I and others are disappointed that

DR. DOBB’S DIGEST 16 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t
we didn’t take the opportunity of “fix and In the “status quo” design, “concept” maps 2. What concerns do people have?
ship,” but C++ is not an experimental aca- are used for two things: Programmability
demic language. Unless members are con- Complexity of formal specification
• To map types to “concepts” by Compile time
vinced that the risks for doing harm to pro- Runtime
adding/mapping attributes and
duction code are very low, they must • To assert that a type matches a “concept.”
oppose. Collectively, the committee is
responsible for billions of lines of code. For My personal concerns focus on “pro-
example, lack of adoption of C++0x or Somehow, the latter came to be seen an grammability” (ease of use, generality, teach-
long-term continued use of unconstrained essential function by some people, rather ability, scalability) and the complexity of the
templates in the presence of “concepts” than an unfortunate rare necessity. When formal specification (40 pages of standards
would lead to a split of the C++ communi- two “concepts” differ semantically, what is text) is secondary. Others worry about com-
ty into separate subcommunities. Thus, a needed is not an assertion that a type meets pile time and runtime. However, I think the
poor “concept” design could be worse than one and not the other “concept” (this is, at experimental implementation (Concept-
no “concepts.” Given the choice between best, a workaround — an indirect and elab- GCC [Gregor2006]) shows that runtime for
the two, I too voted for removal. I prefer a orate attack on the fundamental problem), constrained templates (using “concepts”)
setback to a likely disaster. but an assertion that a type has the seman- can be made as good as or better than cur-
tics of the one and not the other “concepts” rent unconstrained templates. ConceptGCC
Technical Issues (fulfills the axiom(s) of the one and not the is indeed very slow, but I don’t consider that
The unresolved issue about “concepts” other “concept”). fundamental. When it comes to validating
focused on the distinction between explicit For example, the STL input iterator and an idea, we hit the traditional dilemma.
and implicit “concept” maps (see [BS2009]): forward iterator have a key semantic differ- With only minor oversimplification, the
ence: You can traverse a sequence defined horns of the dilemma are:
1. Should a type that meets the requirements by forward iterators twice, but not a
of a “concept” automatically be accepted sequence defined by input iterators; e.g., • “Don’t standardize without commercial
where the “concept” is required (e.g. should applying a multi-pass algorithm on an implementation”
• “Major implementers do not imple-
a type X that provides +, -, *, and / with suit- input stream is not a good idea. The solu- ment without a standard”
able parameters automatically match a “con- tion in “status quo” is to force every user to
cept” C that requires the usual arithmetic say what types match a forward iterator Somehow, a detailed design and an
operations with suitable parameters) or and what types match an input iterator. My experimental implementation have to
should an additional explicit statement (a suggested solution adds up to: If (and only become the basis for a compromise.
“concept” map from X to C) that a match if) you want to use semantics that are not My principles for “concepts” are:
is intentional be required? (My answer: Use common to two “concepts” and the compil-
automatic match in almost all cases). er cannot deduce which “concept” is a bet- • Duck typing
The key to the success of templates for
2. Should there be a choice between auto- ter match for your type, you have to say GP (compared to OO with interfaces
matic and explicit “concepts” and should a which semantics your type supplies; e.g., and more)
designer of a “concept” be able to force every “my type supports multi-pass semantics.” • Substitutability
user to follow his choice? (My answer: All One might say, “When all you have is a Never call a function with a stronger
“concepts” should be automatic.) ‘concept’ map, everything looks like need- precondition than is “guaranteed”
• “Accidental match” is a minor problem
3. Should a type X that provides a member ing a type/‘concept’ assertion.” Not in the top 100 problems
operation X::begin() be considered a match At the Frankfurt meeting, I summa-
for a “concept” C<T> that requires a func- rized: My “minimal fixes” to “concepts” as pres-
tion begin(T) or should a user supply a “con- ent in the pre-Frankfurt working paper were:
cept” map from T to C? An example is 1. Why do we want “concepts”?
• “Concepts” are implicit/auto
std::vector and std::Range. (My answer: It To make requirement on types used as
To make duck typing the rule
should match.) template arguments explicit
• Explicit refinement
• Precise documentation
To handle substitutability problems
• Better error messages
The answers “status quo before • General scoping of “concept” maps
• Overloading
To minimize “implementation leakage”
Frankfurt” all differ from my suggestions.
• Simple type/“concept” matching
Obviously, I have had to simplify my expla- Different people have different views To make vector a range without
nation here and omit most details and most and priorities. However, at this high level, redundant “concept” map
rationale. there can be confusion — but little or no
I cannot reenact the whole technical controversy. Every half-way reasonable See BS2009 (http://www.open-std.org/jtc1/
discussion here, but this is my conclusion: “concept” design offers that. sc22/wg21/docs/papers/2009/n2906.pdf).

DR. DOBB’S DIGEST 17 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

No C++0x, Long Live C++1x • uniform initialization syntax and semantics


Even after cutting “concepts,” the next C++ standard may be delayed. • user-defined literals
• variadic templates
Sadly, there will be no C++0x (unless you count the minor corrections
in C++03). We must wait for C++1x, and hope that ‘x’ will be a low and libraries:
digit. There is hope because C++1x is now feature complete (excepting
• improvements to algorithms
the possibility of some national standards bodies effectively insisting
• containers
on some feature present in the formal proposal for the standard). “All” • duration and time_point
that is left is the massive work of resolving outstanding technical issues • function and bind
and comments. • forward_list a singly-liked list
• future and promise
• garbage collection ABI
A list of features and some discussion can be found on my
• hash_tables; see unordered_map
C++0x FAQ (http://www.research.att.com/~bs/C++0xFAQ.html). • metaprogramming and type traits
Here is a subset: • random number generators
• regex a regular expression library
• atomic operations • scoped allocators
• auto (type deduction from initializer) • mart pointers; see shared_ptr, weak_ptr, and unique_ptr
• C99 features • threads
• enum class (scoped and strongly typed enums) • atomic operations
• constant expressions (generalized and guaranteed; constexpr) • tuple
• defaulted and deleted functions (control of defaults)
• delegating constructors Even without “concepts,” C++1x will be a massive improve-
• in-class member initializers
• inherited constructors ment on C++98, especially when you consider that these fea-
• initializer lists (uniform and general initialization) tures (and more) are designed to interoperate for maximum
• lambdas expressiveness and flexibility. I hope we will see “concepts” in
• memory model a revision of C++ in maybe five years. Maybe we could call that
• move semantics; see rvalue references C++1y or even “C++y!”.
• null pointer (nullptr)
• range for statement
—Bjarne Stroustrup designed and implemented the C++ pro-
• raw string literals
• template alias gramming language. He can be contacted at http://www
• thread-local storage (thread_local) .research.att.com/~bs/.
• unicode characters Return to Table of Contents

DR. DOBB’S DIGEST 18 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

A Build System for Complex Projects:


Part 1
A different approach to build systems

B
By Gigi Sayfan uild systems are often a messy set of menting a build system to replace the existing
scripts and configuration files that let one, which was a nasty combination of Makefiles
you build, test, package, deliver, and and Perl scripts that actually worked but nobody
install your code. As a developer, you was sure why (the original author had left the
either love or loathe build systems. In this article building). There were a few bugs (for example,
series, I present a different approach to build sys- the build system didn’t always follow the proper
tems, with the ultimate goal of completely hiding dependency path) and a big requirements docu-
the build system from developers. But first, let me ment. Clearly it would be impossible to evolve
start with some personal history. the current build system, so I had to create a new
Early in my programming career I was a pure one from scratch. This was lucky because I had
Windows developer (with the exception of my zero experience with Makefiles and Perl, coupled
very first job, where I wrote Cobol programs for with the tolerance threshold of a Windows devel-
publishing Australia’s Yellow Pages). While there oper to gnarly stuff. I still have the same toler-
was no build system to speak of, there was Visual ance, but I now know something about Makefiles.
Studio and Visual SourceSafe. I built Windows Some of the requirements were pretty unusu-
GUI clients, messed around with COM compo- al, like running a commercial code generator that
nents, and picked up some nice C++ template produces code from UML diagrams on a
tricks from ATL. And because automated unit Windows machine, then uses the artifacts to com-
testing wasn’t very common back then, we creat- pile code on Linux, Solaris, and LynxOS. The bot-
ed various test programs before passing code on tom line is that I decided to take an unusual
to QA. This wasn’t too painful since I worked for approach and wrote the entire system in Python.
a small startup company and the projects weren’t It was my first big Python project and I was real-
too big. ly surprised at how well it went. I managed every-
But I then moved to a company that developed thing in Python. I directly invoked the compiler
software for chip fabrication equipment in the and linker on each platform, then the test pro-
semi-conductor industry and BOOM! Life-critical grams, and finally a few other steps. For instance,
and mission-critical real-time software running I implemented friendly error messages that pro-
on six computers that controlled custom-built vided helpful suggestions for common errors
hardware in clean-room conditions. The software (e.g., “FrobNex file not found. Did you remember
ran on several operating systems with about 50 to configure the FrobNex factory to save the
developers contributing code. The development file?”).
environment consisted of two machines running While I was generally pleased with the sys-
Linux and Windows/Cygwin. The deployment tem, it wasn’t completely satisfactory. In lieu of
environment was Solaris and LynxOS RTOS. No Makefiles, I created build.xml files, a la Ant. That
more Visual Studio. After reading about 1000 was a mistake. The XML files were verbose com-
pages of documentation in the first week and get- pared to Makefiles, big chunks were identical for
ting my .profile and .bashrc in order, I was many subprojects, and people had to learn the
assigned my first task — designing and imple- format (which was simple, but something new). I

DR. DOBB’S DIGEST 19 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

wrote a script that migrated Makefiles to build system because every change a devel- since Microsoft Bob (http://en.wikipedia
build.xml files, but it just increased code oper makes must trigger at least a partial .org/wiki/Microsoft_Bob). However, they
bloat. I created a custom build system with- build. When I say “developer” I don’t neces- still don’t always live up to their potential:
out regard for the specific environment and sarily mean a software engineer. I could be
its needs. I created a very generic system, referring to a graphic artist, technical writer, • They Don’t Do Enough (Not Fully
with polymorphic tools that can do any- or any other person that creates source con- Automated). This is one of the most
common problems. A build system that
thing as long as you write the code for the tent. When a build fails, it’s most often is not fully automated can compile the
tool and configure it properly. This was because a developer changed something that software, create documentation, and
bad. Whenever someone says, “You just broke the build. On rare occasions, it would package the final binary, but it requires
have to...” I know I’m in trouble. What I be an administrator action (e.g., changing a lot of user intervention to run various
took away from this experience is that the URL of a staging server or shutting down scripts, wait for previous stages to fin-
ish, check error reports, and so on.
Python is a terrific language. It’s really fun some test server) or a hardware problem • Requires a Lot of Discipline to Use
when you can actually debug the build sys- (e.g., source control server is down). A good Properly. Some build systems fail inex-
tem itself. Having full control over the build system saves time by automating plicably if you don’t follow a slew of
build system is great, too. tedious and error-prone activities. obscure steps, like logging into the test
Think about a developer manually server with a specific user, removing
directory A, renaming directory B, mak-
Background: What Does a building and unit testing a program. ing sure you perform step X only if the
Build System Do? Without a build system, he has to very report generated by step Y says okay.
The build system is the software develop- carefully build it properly, test it, and hand • Requires Too Much Configuration.
ment engine. Software development is a it over to QA. The QA person needs to run Some build systems are very powerful
complex activity that involves tasks such his own tests, then hand it to the adminis- and flexible, but are almost unusable
due to excessive configuration. You
as: source-code control, code generation, trator for deployment to a staging site, have to define six different environ-
automated source code checks, documenta- where more tests are run against the ment variables, modify three local con-
tion generation, compilation, linking, unit deployed system. If anything goes wrong in fig files, and pass eight different com-
testing, integration testing, packaging, cre- this process, someone must determine mand-line options to the main build
ating binary releases, source-code releases, what happened. Automated build systems script. The end result is that 99% of the
users use a single default configuration
deployment, and reports. That said, soft- eliminate a whole class of errors. They that probably doesn’t fit their needs.
ware development usually boils down to never forget a step and they can pinpoint • Caters Mainly To a Sole Stakeholder.
four main phases: and resolve other errors by verifying that Another common problem is that a
the source artifacts and intermediate arti- build system is often suitable for just
1. Developers write source code and content facts are available and by scanning through one kind of stakeholder. For example,
if the build system was developed
(graphics, templates, text, etc.) log files and detecting failures. mainly by the programmers who com-
2. The source artifacts are transformed to end Managers can also benefit from build pile, link, and unit test all day, then the
products (binary executables, web sites, in- systems. A passing build is the pulse of a build system will have good support
stallers, generated documents) project. If you have an automated build sys- for these activities, but running inte-
3. The end products are tested tem with good test coverage (at the system gration tests or generating documenta-
tion may be poorly supported, if at all.
4. The end products are deployed or distributed level), managers can monitor project On the other hand, if the build system
progress and be ready to release at each was developed mainly by a release
A good automated build system can point. This in turn enables more agile devel- engineering team, then it will have
take care of steps 2–4. The distribution/ opment practices (if you are so inclined). good support for packaging final exe-
deployment phase is usually to a local A build system can even help users in cutables and will generate good reports
about the percentage of passing test,
repository or a staging area. You will prob- some cases. Think about systems that but it may not be possible for develop-
ably need some amount of human testing incorporate user-generated content and/or ers to run just a single unit test and its
before actually releasing the code to pro- plug-ins. In most cases, you need to go dependencies, and they will either
duction. The build system can also help over the content and ensure it doesn’t have to run the full-fledged build every
with that by notifying users about interest- break your system. A build system that time or hack the build system in a
quick and dirty way (which might lead
ing events, such as successful and/or failed automates some/all of these checks allows to errors).
builds and providing debugging support. for shorter publish/release cycles for user- • Intractable Error Messages When
But really, who cares about all this stuff? generated content. Something Is Wrong. Build systems
Actually everybody — developers, adminis- perform many activities that involve
trators, QA, managers, and even users. The Build System Problems external tools. The errors generated by
these tools are often swallowed by the
developers interact most closely with the Okay, build systems are the greatest thing build system that much later generates

DR. DOBB’S DIGEST 20 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

its own error message, which doesn’t “Convention over configuration” is a dled together) are the simplest. They
point to the root cause. This is a seri- principle that has successfully governed in are later linked into dynamic libraries
ous problem that hurts productivity and executables. Dynamic libraries and
and causes people to revert to manual domains like web frameworks, reducing executables are similar from a build
but understandable build practices. the learning curve and increasing develop- point of view. They both have source
• Inextensible and Undebuggable. er productivity. It demands that you organ- files and depend on precompiled static
Franken-CodeBuild systems are often ize your project in a consistent way (which libraries to link against. It is important
one of the earliest tools created at proj- is always good practice in any event): to build the dependent dynamic
libraries and executables after building
all the required static libraries. Many
libraries (both static and dynamic) and
executables use the same set of compil-
The perfect build system solves or minimizes the problems er and linker flags. Placing these
groups under a parent directory
informs the build system of these com-
associated with existing build systems mon flags and automatically builds all
the subprojects.
• Generate build files from templates
for any IDE. Different IDEs, as well as
command-line based tools like Make,
ect initiation. The requirements of this • Regular directory structure. This is use different build files to represent
early build system are usually minimal. the key principle on which the entire the meta information needed to build
As time goes by and the project grows, build system rests. Even in the most the software. The build system I pres-
the demands from the build system complicated systems, there is usually a ent here maintains the same informa-
grow too. Since the build system is an relatively small high-level directory tion via its inherent knowledge com-
internal tool, less effort is dedicated to structure that contains a potentially bined with the regular directory struc-
making it high quality code. More often huge number of similar directories. For ture and can generate build files for
than not, it is just a bunch of scripts example, a project may have a libs any other build system by populating
slapped together and extended to sup- directory that contains all the C++ stat- the appropriate templates. This
port additional requirements by the ic libraries. The contents of the libs approach lets developers build the soft-
tried and true practice of copy and directory may grow and change, but it ware via their favorite IDE (like Visual
paste. Such build systems quickly always contains a single type of entities. Studio) without the hassle involved in
become a maintenance nightmare and • Well-known locations. The build sys- adding files, setting dependencies, and
can’t be extended easily to accommo- tem should be aware of the location specifying compiler and linker flags.
date new requirements. and names of the top-level directories • Automatic dependency management
• Not Integrated With Developer’s IDE. and “understand” what they mean. For based on #include analysis. Managing
Most build systems that don’t come example, it should know that the direc- dependencies can be simple or compli-
with an IDE built-in don’t support tories under libs generate static cated depending on the project. In any
IDEs. They are command-line based libraries that should later be linked case, missing a dependency leads to
only and if a developer wants to work into executables and dynamic libraries linking errors that are often hard to
in an IDE, the IDE project files must be that depend on them. resolve. This build system analyzes
maintained and synchronized with the • Automatic discovery of files based on the #include statements in the source
build system build files. For example, extension. Each directory usually con- files and recursively creates a com-
the build system may be Makefile- tains a small number of file types. plete dependencies tree. The depend-
based, and a developer that uses Visual Again, in the libs example, it should encies tree determines what static
Studio has to maintain a .vcproj file for contain .h and .c/.cpp files and poten- libraries a dynamic library or exe-
each project, and any additional files tially a couple of other metadata files. cutable needs to link against.
must be added to the Makefile as well. The build system should know what • Automatic discovery of added/re-
files to expect and how to handle each moved/renamed files and directories.
The Perfect Build System file type. Once you have the regular The regular directory structure, com-
directory structure in place, the build bined with knowledge of files types
The build system I present in this series (e.g., .cpp or .h files), allows the build
system “knows” a lot about your sys-
is open ended and can be used to auto- tem and can do many tasks on your system to figure out what files it needs
mate any software process that is mainly behalf automatically. In particular, it to take into account, so developers just
file-based. However, the focus is on a doesn’t need in a build file in each need to make sure the right files are in
cross-platform build system for large- directory that tells it what files are in the right directory.
it, how to build them, etc. • Flexibility
scale C++ projects because these are often Support static libraries, dynamic
• Capitalize on the small variety of
the most complicated to build. The per- subproject types. In the C/C++ world, libraries, executables, and custom arti-
fect build system solves or minimizes the there are really only three types of sub- facts. All possible build artifacts are
problems associated with existing build projects: a static library, a dynamic supported including custom ones like
systems. library, and an executable. Static code generators, preprocessors, and
libraries (a compiled set of files bun- documentation generators. The ability

DR. DOBB’S DIGEST 21 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t
to put similar files and subprojects will create a complex project with multiple root
under top-level directories in the regu-
lar directory structure is open to any executables, static and dynamic libraries, |___ ibs

subproject type. and even Ruby bindings. The project will |___ src

Control the level of error messages. run on Windows, Linux, and Mac OS X. It |___apps
The build system is designed to sup- will be built using a custom build system. |___bindings
port different users, such as QA, To whet your appetite, here is a prototype |___dlls
developers, and managers. Each type
of user may be interested in different in Python of the finished project: |___hw (static libraries)

error messages. |___test


print ‘Hello, World!’
Generate custom artifacts like lan-
guage bindings. The build system is • The ibs directory contains the files and
focused on building C/C++ code, but Project templates of the build system. Note
using the same practices and mecha- that it is completely separate from the
nisms it is possible to extend it to sup- Kick-Off source tree under src.
port additional artifacts, while main- Isaac, the sage development manager, • The src directory contains all the
taining all the existing benefits. assembled a team of brilliant software source files of the system. Let’s take a
Allow overriding defaults. While the developers with umpteenth-years of experi- quick look at the top-level directories
build system is intended to provide a ence in delivering high-performance enter- under src.
hands-free experience, where all the apps. This directory contains all the
necessary build information is derived prise applications. The kick-off meeting (executable) applications generated
automatically from the directory went well and the developers quickly by the system. Each application will
structure, it is possible to override it reached a few decisions: have its own directory under apps.
for special purposes, such as a single bindings. This directory will contain
library that needs different flags. • The project will be developed mostly Ruby bindings at some point. At the
• Integrated Build System in C++, moment it is empty.
Build phases are executed from the • The system must be cross-platform and dlls. This directory contains the pro-
same program. The build system is a support Windows, Linux, and Mac OS X, ject’s dynamic libraries.
cohesive program that operates on a set • The developers will be divided into hw. This directory contains the pro-
of templates and source files. This one- four teams. ject’s static libraries. The reason it is
stop shop approach is very powerful for Team H will develop a static library called hw (as in “hello world”) and not
keeping the build process manageable. called libHello that returns “Hello”. libs or a similar name is that it is very
Invoke external programs as a last Team P will develop a dynamic library important to prevent name clashes
resort. Ideally, the build system con- called libPunctuator that produces with system or third-party static
tains the entire logic of each build commas and exclamation points (and libraries. The automatic dependency
step. External programs are invoked can be reused in future projects requir- discovery of the build system relies on
only when the effort to implement the ing punctuation). analysis of #include statement. The
logic in the build system itself is Team W will develop the complicated unique hw part of the path of each
deemed too costly. For example, the libWorld static library that must static library allows unambiguous res-
compiler and linker are invoked as return the long and difficult word olution of #include statements.
external programs. “World”. test. This directory contains a subdi-
Full debugging of the build system. Team U will develop an infrastructure rectory for each test program. Each
The fact that the build system is a sin- project called libUtils that provides test program is a standalone exe-
gle program allows users to debug the utility services to the other teams. cutable linked against the static
build process in real-time including • The project will also deliver a Ruby libraries it is designed to test.
setting breakpoints, viewing the cur- language binding to make it more buzz-
rent state, and finding live build sys- word-compliant.
tem bugs. This is very different from • The test strategy is to develop multiple Next Time
standard declarative build files that test programs to test every library. In the next installment of this series, Bob
usually only provide obscure error Each team will be responsible for
messages at a much later stage. and I delve into the innards of the build
developing the test program for its
library. system and explain exactly how it works.
Hello, World (Platinum • The build system will be developed in Stay tuned.
Enterprise Edition) Python by the renowned build expert
I hope you agree that this build system Bob (aka “The Builder”). (No connec- — Gigi Sayfan specializes in cross-
sounds awesome. But is it for real? To tion to Microsoft Bob, thank you.) platform object-oriented programming in
demonstrate and explore its capabilities, I C/C++/ C#/Python/Java with emphasis on
will follow an imaginary software team that Bob carefully observed the source and large-scale distributed systems. He is cur-
just started working on a new project. required artifacts of the system and came rently trying to build intelligent machines
The project is called “Hello, World!”. up with the following directory structure. inspired by the brain at Numenta
The goal is to print it to the screen. To do Each kind subproject is contained in a top- (www.numenta.com).
this, over the course of this series, the team level directory under the source tree: Return to Table of Contents

DR. DOBB’S DIGEST 22 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

Click here to register for Dr. Dobb’s M-Dev, a weekly e-newsletter


focusing exclusively on content for Microsoft Windows developers.

Integrating jQuery
Client-Side Data Templates
With WCF
Using client-side templates and binding JSON data that’s retrieved from a WCF service

I
By Dan Wahlin n my article “Minimize Code by Using jQuery and Data Templates” (http://www.ddj.com/win-
dows/217701311), I presented a JavaScript data binding template solution that I’ve been using to
make it easy to bind JSON data to a client-side template without having to write a lot of JavaScript
code. In this article, I demonstrate the fundamentals of using the client-side templates and bind-
ing JSON data that’s retrieved from a WCF service. As a review (in case you didn’t read the previous
article), the template solution I’ve been using recently on a client project is based on some code writ-
ten by John Resig (creator of jQuery), which is extremely compact. Here’s a modified version of his
original code that I wrapped with a jQuery extender:

$.fn.parseTemplate = function(data)
{
var str = (this).html();
var _tmplCache = {}
var err = "";
try
{
var func = _tmplCache[str];
if (!func)
{
var strFunc =
"var p=[],print=function(){p.push.apply(p,arguments);};" +
"with(obj){p.push('" +
str.replace(/[\r\t\n]/g, " ")
.replace(/'(?=[^#]*#&gt;)/g, "\t")
.split("'").join("\\'")
.split("\t").join("'")
.replace(/&lt;#=(.+?)#&gt;/g, "',$1,'")
.split("&lt;#").join("');")
.split("#&gt;").join("p.push('")
+ "');}return p.join('');";

//alert(strFunc);
func = new Function("obj", strFunc);
_tmplCache[str] = func;
}
return func(data);
} catch (e) { err = e.message; }
return "&lt; # ERROR: " + err.toString() + " # &gt;";
}

DR. DOBB’S DIGEST 23 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

The parseTemplate method can be applied against a client-side html method of the target div, which displays the data in the browser.
template like the one below. Notice that the template is wrapped in Note: I’m defining the d property in the JSON object because WCF
a script block with the type set to text/html so that it isn’t rendered uses that name by default when it returns serialized JSON data.
by the browser. JSON properties are written out by using the var json =
<#= #> syntax and the template engine has full support for embed- {
"d":
ded JavaScript code. [
{ "FirstName": "John", "LastName": "Doe",
"Address":
<script id=”MyTemplate” type=”text/html”> { "Street": "1234 Anywhere St.", "City": "Phoenix",
<table style=”width:400px;”> "State": "AZ", "Zip": 85044 }
<thead> },
<tr> { "FirstName": "Jane", "LastName": "Doe",
<th>First Name</th> "Address":
<th>Last Name</th> { "Street": "435 Main St.", "City": "Tempe",
<th>Address</th> "State": "AZ", "Zip": 85245 }
</tr> },
</thead> { "FirstName": "Johnny", "LastName": "Doe",
<tbody> "Address":
<# { "Street": "643 Chandler Blvd", "City": "Chandler",
for(var i=0; i < d.length; i++) "State": "AZ", "Zip": 85248 }
{ },
var cust = d[i]; { "FirstName": "Dave", "LastName": "Doe",
#> "Address":
<tr> { "Street": "18765 Cloud St.", "City": "Mesa",
<td id=”CustomerRow_<#= i.toString() #>”> "State": "AZ", "Zip": 85669 }
<#= cust.FirstName #> }
</td> ]
<td> };
<#= cust.LastName #> var output = $('#MyTemplate').parseTemplate(json);
</td> $('#MyTemplateOutput').html(output);
<td>
<#= cust.Address.Street #>
<br />
<#= cust.Address.City #>,
<#= cust.Address.State #> Of course, in the real-world you’ll probably get the JSON data
&nbsp;&nbsp;<#= cust.Address.Zip #>
</td> from some type of service (WCF, ASMX, REST, etc.). Here’s a WCF
</tr>
<# service that returns a List of Customer objects and converts them
}
#>
to JSON. The service has the client script behavior enabled so that
</tbody> serialization from CLR objects to JSON objects occurs behind the
</table>
<br /> scenes automatically.
<#= d.length #> records shown
</script>
[ServiceContract(Namespace = "http://www.thewahlingroup.com")]
[AspNetCompatibilityRequirements(RequirementsMode =
This template outputs a simple table like Figure 1. Sure, I could AspNetCompatibilityRequirementsMode.Allowed)]
public class CustomerService
have generated the table using DOM manipulation techniques, but {
[OperationContract]
being able to tweak a data template is much easier and productive public List<Customer> GetCustomers()
in my opinion. {
return new List<Customer>
To use the template you’ll need to have some JSON data available. {
new Customer {FirstName="John",LastName="Doe",
Here’s an example of creating JSON by hand and binding it to the tem- Address=
plate using the parseTemplate method shown earlier. The data new Address{
Street="1234 Anywhere St.",
returned from the template data binding operation is passed to the City="Phoenix",State="AZ", Zip=85044}},
new Customer {FirstName="Jane",LastName="Doe",
Address=
new Address{
Street="435 Main St.",
City="Tempe",State="AZ", Zip=85245}},
new Customer {FirstName="Johnny",LastName="Doe",
Address=
new Address{
Street="643 Chandler Blvd",
City="Chandler",State="AZ", Zip=85248}},
new Customer {FirstName="Dave",LastName="Doe",
Address=
new Address{
Street="18765 Cloud St.",
City="Mesa",State="AZ", Zip=85669}}
};
}

}
jQuery’s ajax method can then be used to call the WCF service
and retrieve the data (jQuery provides other methods such as
Figure 1 getJSON that could be used too if desired):

DR. DOBB’S DIGEST 24 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

$.ajax(
{ bind method to highlight rows as the user moves the mouse in and
type: "POST",
url: "CustomerService.svc/GetCustomers", out of them.
dataType: "json",
data: {}, You can see that the amount of custom JavaScript that has to be
contentType: "application/json; charset=utf-8", written is kept to a minimum by combining jQuery with the client-
success: function(json)
{ side template, which ultimately leads to easier maintenance down
var output = $('#MyTemplate').parseTemplate(json);
$('#MyTemplateOutput').html(output); the road. This is just one of several different client-side template
//Add hover capabilities
solutions out there. ASP.NET 4.0 will also include a custom client-
$('tbody > tr').bind('mouseenter mouseleave', function() side template solution as well once released. You can download the
{
$(this).toggleClass('hover'); sample code here (http://i.cmpnet.com/ddj/images/article/
});
} 2009/code/jQueryDataTemplates.zip).
});

— Dan Wahlin (Microsoft Most Valuable Professional for


This code defines the type of operation, service URL to call, any ASP.NET and XML Web Services) is the founder of The Wahlin
data passed to the service, the content type, and a success callback. Group (www.TheWahlinGroup.com), which provides .NET,
Once the service call returns, the JSON data is bound to the tem- SharePoint, and Silverlight consulting and training services. Dan
plate shown earlier by locating the area where the template should blogs at http://weblogs.asp.net/dwahlin.
be rendered to (MyTemplateOutput in this example) and then call-
ing parseTemplate. Hover capabilities are also added using jQuery’s Return to Table of Contents

DR. DOBB’S DIGEST 25 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t [ Of Interest ]

Of Interest
JetBrains has released Version 1.0 of MPS (short for “Meta Three Key Challenges to Adding Parallelism to Your Applications
Programming System”), a language workbench and IDE for extend-
ing existing languages and creating custom Domain Specific
Languages. By using MPS and DSLs created with its help, domain
experts can solve domain-specific tasks easily, even if they’re not
familiar with programming. MPS is freely available, with a major
part of its source code open and available under the Apache
license, and can be downloaded bug-tracking system, code-named
Charisma, is developed entirely with MPS. This issue tracker is a
modern Web 2.0 application. To create it, a whole stack of web
application languages was created: languages for HTML templates,
controllers, database access, JavaScript, etc. MPS doesn’t use any
parsers. It works with the abstract syntax tree directly, so it does-
n’t require any parsing. Compiler construction knowledge might CLICK SCREEN TO LAUNCH VIDEO ABOUT 3 KEYS TO MULTICORE
be useful, but you don’t have to be an expert in this field in order For more videos on this topic, go to www.ddj.com//go-parallel/
to use MPS: it contains a predefined set of languages with which
users can create their own languages. http://www.jetbrains Static analyzers try to find weaknesses in other programs that could
.com/mps/?mps1pr be triggered accidentally or exploited by intruders. A report from
the National Institute of Standards and Technology (NIST) enti-
Intel has made available for free download Prototype Edition tled “Static Analysis Tool Exposition (SATE), “edited by Vadim
3.0 of the Intel C++ STM Compiler. (STM is short for “Software Okun, Romain Gaucher, and Paul Black, documents NIST’s Static
Transactional Memory.”) The Transactional Memory C++ lan- Analysis Tool Exposition —an exercise by NIST and static analyzer
guage constructs that are included open the door for users to vendors to improve the performance of these tools. The static ana-
exercise the new language constructs for parallel programming, lyzers (and languages) in the study included Aspect Security ASC
understand the transaction memory programming model, and 2.0 (Java), Checkmarx CxSuite 2.4.3 (Java), Flawfinder 1.27 (C),
provide feedback on the usefulness of these extensions with Intel Fortify SCA 5.0.0.0267 (C, Java), Grammatech CodeSonar 3.0p0
C++ STM Compiler Prototype Edition. This posting includes the (C), HP DevInspect 5.0.5612.0 (Java), SofCheck Inspector for Java
Intel C++ STM Compiler Prototype Edition 2.0 and runtime 2.1.2 (Java), University of Maryland FindBugs 1.3.1 (Java), and
libraries for Intel transactional memory language construct Veracode SecurityReview (C, Java). According to NIST’s Vadim
extensions. http://software.intel.com/en-us/articles/intel-c- Okun, SATE was a long-overdue idea. “Most modern software is too
stm-compiler-prototype-edition-20/ lengthy and complex to analyze by hand,” says Okun. “Additionally,
programs that would have been considered secure ten years ago
TeamDev has released Selenium Inspector, an open-source may now be vulnerable to hackers. We’re trying to focus on identi-
library that runs on top of the Selenium, a tool designed to sim- fying what in a program’s code might be exploitable.” While the
plify automated testing of Web components, pages and applica- SATE 2008 process was not designed to compare the performance
tions — especially those written using JSF. The Selenium of participating tools, it was successful in understanding some of
Inspector API lets you create testing solutions for variety of their capabilities in a wide variety of weaknesses. SATE demon-
HTML rendering frameworks like JSF component libraries, Spring strated that results from multiple tools can be combined into a sin-
MVC, and Struts. Web developers can create object-oriented test- gle database from which further analysis is possible. While the
ing APIs for any Web UI library. The Java API for inspecting backtrace explanations were useful, the study concluded that the
OpenFaces components is already included. Selenium Inspector evaluation might have been more efficient and less error-prone by
provides an API similar to that of Selenium, but is simpler to use closely integrating with the navigation and visualization capabilities
in many cases and provides a bit higher level of abstraction. It of the tools. The SATE report is available at http://samate.nist
doesn’t replace Selenium, but provides an additional API that you .gov/docs/NIST_Special_Publication_500-279.pdf
can use if you find it more appropriate for your actual needs. You
can use both Selenium and Selenium Inspector APIs at the same
time. http://seleniuminspector.org/ Return to Table of Contents

DR. DOBB’S DIGEST 26 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t [
Conversations]

Q&A: Open Database


What does the future hold for MySQL?

M
ichael is very easy to embed in web-based applications.
by Jonathan Erickson “Monty” As long as people are developing web pages with
Widenius programming languages like PHP, Perl, Ruby, and
was the Java, SQL will have it’s place.
creator of the MySQL
database, and founder of Dr. Dobb’s: What will the biggest change in data
Monty Program Ab. He storage in five years?
recently spoke with Dr. Widenius: SSD (solid-state drive) memory
Dobb’s editor-in-chief will force a paradigm shift in how data is
Jonathan Erickson stored and accessed and a lot of old proven
database algorithms have to be changed
Dr. Dobb’s: What’s the Open Database Alliance? because there is no seek time anymore.
Widenius: The Open Database Alliance is a ven-
dor neutral consortium of vendors and individu- Dr. Dobb’s: What’s the most exciting develop-
als commercially supporting or delivering serv- ment in DBMS technology today?
ices around MariaDB and MySQL. Open Widenius: : On the software side, the usage of
Database Alliance partners will support each Memcached and Gearman to do inexpensive
other’s open source initiatives, and resell each “cloud like” computing is of course interesting.
other’s services. This makes it possible for cus- We are also seeing dedicated inexpensive
tomers to get all services they require around machines that provides Memcached interfaces
their database issues through any vendor in the which will notable speed up and simplify any
Alliance. setup that uses Memcached (which is a standard
component for most busy web sites).
Dr. Dobb’s: What’s MariaDB?
Widenius: It’s a community developed branch of Dr. Dobb’s: Will operating systems ultimately be
MySQL with bug fixes and new features devel- successful in converting their filesystems into
oped by the MariaDB community, of which Monty SQL-managed organizations of data?
Program Ab is an active member. We will keep Widenius: I think that is a stupid idea. Most data
MariaDB in sync with MySQL development to people store is not suitable really suitable for SQL.
ensure all bug fixes and features in MySQL also SQL will only notable slow things down when
exists in MariaDB. At this time MariaDB 5.1 accessing things and will create a lot more fragmen-
should be notable faster, have more features and tation compared to modern file systems without
have fewer bugs than the corresponding MySQL providing anything really critical for the end user.
5.1 release. Another problem is that SQL managed data is very
bad of application that wants to have their own
Dr. Dobb’s: Is SQL adequate for 21st century access to the part of the data (like another database
computing? server running on a SQL managed filesystem).
Widenius: Yes. SQL will be around for a long
time because it’s a very expressive language that Return to Table of Contents

DR. DOBB’S DIGEST 27 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t [ Book Review ]

Using Google App Engine

Reviewed by Mike Riley Using Google App Engine


by Charles Severance
O'Reilly Media
262 pages; $29.99

E
ven though Google App Engine has been available to a GAE-friendly application and understanding and fine-tuning the
developers for some time, deep technical books on GAE memory cache parameters. Four appendixes, one each ded-
applying this scalable cloud service have been sparse. icated to the target development OS (Windows XP, Windows
Does O’Reilly’s entry, under their Google Press imprint, Vista, Mac OSX and Linux) literally repeat the same information
fill this void? Read on to find out. with the name of the OS replaced and other minor differences.
Like many web-connected developers, I have been aware of These were quite frankly a waste of paper; the author should have
Google App Engine (GAE) since its invitation-only beta days, but consolidated the OS variations into a simple grid or footnotes
never really took much interest in it. I am a big fan of the Python where appropriate. That would have left more space for explain-
scripting language, but the fact that GAE uses Python as its pre- ing the inner workings, design optimizations and best practices
ferred logic language somehow failed to grab me. One of the main for developing best of breed GAE designs.
reasons for this was at the same time as GAE’s initial public beta, Besides the minimal amount of unique material in the book, one
I was busy immersing myself in the Python-based Django frame- of its biggest failings for me was presenting me with a convincing
work and I wasn’t about to confuse myself with an alternative argument to use GAE in the first place. The advantages mentioned
approach to Python-centric web application development. by the author read like a Google advertisement from a Google fan-
Fortunately, GAE was constructed with enough flexibility to allow boy. The author failed to share any well-known websites that run on
a framework like Django to live within its constructs, as detailed GAE, interview others who are as enamored with GAE as he is, pro-
in this April 2008 article by Googler Damon Kohler (http:// vide a chapter or appendex on answering privacy and intellectual
code.google.com/appengine/articles/django.html). Additionally, property right concerns, service level expectations, etc.. While the
with the inclusion of Java support, GAE offers plenty of flexibility book did a fair job at elevating my interest in GAE, it wasn’t enough
for the developer seeking a hosted cloud solution. Unfortunately, for me to consider placing any of my web apps into Google’s cloud.
author Charles Severance failed to explore either of these impor- Overall, O’Reilly (and Google for that matter) missed a golden
tant features in Using Google App Engine. The book is instead ori- opportunity with this book to deeply explore the technical facets
ented toward first-time web programmers unfamiliar with even the of GAE. Instead, they spent paper and ink on rehashing basic con-
most rudimentary aspects of web development. Nearly half the cepts that have been much better served in other O’Reilly titles.
book is spent on the basics of HTML, CSS, HTTP and basic Python While this might be a helpful book for someone who has no web
syntax. Considering the book’s brevity and cost, this expenditure development experience whatsoever, yet aspires toward under-
left few pages solely dedicated to the GAE. standing MVC patterns and a broader grasp of the complexity
Once the beginner tutorials of basic web page construction associated with modern day web applications, it’s a major letdown
and delivery are out of the way, the second half of the book dives for practiced developers seeking deeper understanding of GAE in
into a high-level overview of the GAE, its use of templates (based the real world. Perhaps O’Reilly will revisit this technology again
on Django’s template system, no less), handling cookies and ses- under a “Mastering Google App Engine” title.
sion management, using the proprietary GAE Datastore for struc-
tured data storage, creating a GAE account, uploading and testing Return to Table of Contents

DR. DOBB’S DIGEST 28 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t [ ]
Effective Concurrency

Design for Manycore Systems


Why worry about “manycore”today?

D
By Herb Sutter ual- and quad-core computers are trends over the past 40 years. These numbers
obviously here to stay for mainstream come from Intel’s product line, but every CPU
desktops and notebooks. But do we vendor from servers (e.g., Sparc) to mobile
really need to think about “many- devices (e.g., ARM) shows similar curves, just
core” systems if we’re building a typical main- shifted slightly left or right. The key point is that
stream application right now? I find that, to many Moore’s Law is still generously delivering transis-
developers, “many-core” systems still feel fairly tors at the rate of twice as many per inch or per
remote, and not an immediate issue to think dollar every couple of years. Of course, any expo-
about as they’re working on their current product. nential growth curve must end, and so eventually
This column is about why it’s time right now will Moore’s Law, but it seems to have yet anoth-
for most of us to think about systems with lots of er decade or so of life left.
cores. In short: Software is the (only) gating fac- Mainstream microprocessor designers used to
tor; as that gate falls, hardware parallelism is be able to use their growing transistor budgets to
coming more and sooner than many people yet make single-threaded code faster by making the
believe. chips more complex, such as by adding out-of-
order (“OoO”) execution, pipelining, branch pre-
Recap: What “Everybody Knows” diction, speculation, and other techniques.
Figure 1 is the canonical “free lunch is over” slide Unfortunately, those techniques have now been
showing major mainstream microprocessor largely mined out. But CPU designers are still

Figure 1: Canonical “free lunch is over” slide. Note Pentium vs. dual-core Itanium transistor counts.

DR. DOBB’S DIGEST 29 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

reaping Moore’s harvest of transistors by


the boatload, at least for now. What to do
with all those transistors? The main answer
is to deliver more cores rather than more
complex cores. Additionally, some of the
extra transistor real estate can also be
soaked up by bringing GPUs, networking,
and/or other functionality on-chip as well,
up to putting an entire “system on a chip”
(aka “SoC”) like the Sun UltraSPARC T2.

How Much, How Soon?


How quickly can we expect more paral- Figure 2: Simple extrapolation of “more of the same big cores” (not counting some
lelism in our chips? The naïve answer transistors being used for other things like on-chip GPUs, or returning to smaller cores).
would be: Twice as many cores every cou-
ple of years, just continuing on with Imagine running this 1997 part at neering work, such as in improving the
today’s clock speeds.
Moore’s Law. That’s the baseline projection memory interconnect to make the whole
• 1,700Mt: 2006 “Montecito” Itanium
approximated in Figure 2, assuming that 2. This chip handily jumped past the chip a suitably balanced part. But we can
some of the extra transistors aren’t also billion-transistor mark to deliver two view those as being relatively ‘just details’
used for other things. Itanium cores on the same die. [1] because they don’t require engineering
However, the naïve answer misses sev- breakthroughs.
eral essential ingredients. To illustrate, So what’s the interesting fact? (Hint: Repeat: Intel could have shipped a 100-
notice one interesting fact hidden inside 1,700 ÷ 4.5 = ???.) core desktop chip with ample cache — in
Figure 1. Consider the two highlighted In 2006, instead of shipping a dual-core 2006.
chips and their respective transistor counts Itanium part, with exactly the same transis- So why didn’t they? (Or AMD? Or Sun?
in million transistors (Mt): tor budget Intel could have shipped a chip Or anyone else in the mainstream market?)
that contained 100 decent Pentium-class The short answer is the counter-question:
• 4.5Mt: 1997 “Tillamook” Pentium cores with enough space left over for 16 Who would buy it? The world’s popular
P55C. This isn’t the original Pentium,
it’s a later and pretty attractive little MB of Level 3 cache. True, it’s more than a mainstream client applications are largely
chip that has some nice MMX instruc- matter of just etching the logic of 100 cores single-threaded or nonscalably multi-
tions for multimedia processing. on one die; the chip would need other engi- threaded, which means that existing appli-
cations create a double disincentive:
Hardware threads are important,but only for simpler cores • They couldn’t take advantage the
Hardware threads have acquired a tarnished reputation. Historically, for example, extra cores, because they don’t con-
Pentium hyperthreading has been a mixed blessing in practice; it made some applica- tain enough inherent parallelism to
tions run something like 20% faster by hiding some remaining memory latency not scale well.
• They wouldn’t run as fast on a smaller
already covered in other ways, but made other applications actually run slower be-
and simpler core, compared to a bigger
cause of increased cache contention and other effects. (For one example, see [3].) core that contains extra complexity to
But that’s only because hardware threads are for hiding latency, and so they’re run single-threaded code faster.
not nearly as useful on our familiar big, complex cores that already contain lots of
other latency-hiding concurrency. If you’ve had mixed or negative results with hard- Astute readers might have noticed that
ware threads, you were probably just using them on complex chips where they don’t when I said, “why didn’t Intel or Sun,” I left
matter as much. myself open to contradiction, because Sun
Don’t let that turn you off the idea of hardware threading. Although hardware threads (in particular) did do something like that
are a mixed bag on complex cores where there isn’t much remaining memory latency left already, and Intel is doing it now. Let’s find
to hide, they are absolutely essential on simpler cores that aren’t hiding nearly enough out what, and why.
memory latency in other ways, such as simpler in-order CPUs like Niagara and Larrabee.
Modern GPUs take the extreme end of this design range, making each core very simple Hiding Latency: Complex
(typically not even a general-purpose core) and relying on lots of hardware threads to Cores vs. Hardware Threads
keep the core doing useful work even in the face of memory latency. One of the major reasons today’s modern
CPU cores are so big and complex, to make

DR. DOBB’S DIGEST 30 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

single-threaded applications run faster, is But that doesn’t mean a simpler 1997- cycles; it’s nothing remotely similar to the
that the complexity is used to hide the style core can’t make sense today. You just cost of an operating system-level context
latency of accessing glacially slow RAM — have to provide enough internal hardware switch. For example, a core with four hard-
the “memory wall.” concurrency to hide the memory wall. The ware threads can run the first thread until it
In general, how do you hide latency? squeezing-the-toothpaste-tube metaphor encounters a memory operation that forces
Briefly, by adding concurrency: Pipelining, applies directly: When you squeeze to it to wait, and then keep doing useful work
out-of-order execution, and most of the make one end smaller, some other part of by immediately switching to the second
other tricks used inside complex CPUs the tube has to get bigger. If we take away thread and executing that until it also has to
inject various forms of concurrency within some of a modern core’s concurrency- wait, and then switching to the third until it
the chip itself, and that lets the CPU keep providing complexity, such as removing also waits, and then the fourth until it also
the pipeline to memory full and well-uti- out-of-order execution or some or all waits — and by then hopefully the first or
lized and hide much of the latency of wait- pipeline stages, we need to provide the second is ready to run again and the core
ing for RAM. (That’s a very brief summary. missing concurrency in some other way. can stay busy. For more details, see [4].
For more, see my machine architecture But how? A popular answer is: Through The next question is, How many hard-
talk, available on Google video. [2]) hardware threads. (Don’t stop reading if ware threads should there be per core? The
So every chip needs to have a certain you’ve been burned by hardware threads in answer is: As many as you need to hide the
amount of concurrency available inside it the past. See the sidebar “Hardware latency no longer hidden by other means.
to hide the memory wall. In 2006, the threads are important, but only for simpler In practice, popular answers are four and
memory wall was higher than in 1997; so cores.”) eight hardware threads per core. For exam-
naturally, 2006 cores of any variety needed ple, Sun’s Niagara 1 and Niagara 2 proces-
to contain more total concurrency than in Toward Simpler, sors are based on simpler cores, and pro-
1997, in whatever form, just to avoid Threaded Cores vide four and eight hardware threads per
spending most of their time waiting for What are hardware threads all about? Here’s core, respectively. The UltraSPARC T2
memory. If we just brought the 1997 core the idea: Each core still has just one basic boasts 8 cores of 8 threads each, or 64
as-is into the 2006 world, running at 2006 processing unit (arithmetic unit, floating hardware threads, as well as other func-
clock speeds, we would find that it would point unit, etc.) but can keep multiple tions including networking and I/O that
spend most of its time doing something threads of execution “hot” and ready to make it a “system on a chip.” [5] Intel’s
fairly unmotivating: just idling, waiting for switch to quickly as others stall waiting for new line of Larrabee chips is expected to
memory. memory. The switching cost is just a few range from 8 to 80 (eighty) x86-compatible
cores, each with four or more hardware
threads, for a total of 32 to 320 or more
hardware threads per CPU chip. [6] [7]
Figure 3 shows a simplified view of pos-
sible CPU directions. The large cores are
big, modern, complex cores with gobs of
out-of-order execution, branch prediction,
and so on.
The left side of Figure 3 shows one pos-
sible future: We could just use Moore’s
transistor generosity to ship more of the
same — complex modern cores as we’re
used to in the mainstream today. Following
that route gives us the projection we
already saw in Figure 2.
But that’s only one possible future,
because there’s more to the story. The right
side of Figure 3 illustrates how chip ven-
dors could swing the pendulum partway
back and make moderately simpler chips,
along the lines that Sun’s Niagara and
Figure 3: A few possible future directions. Intel’s Larrabee processors are doing.

DR. DOBB’S DIGEST 31 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

does the CPU get the work to assign to


those multiple hardware threads? The
answer is, from the same place it gets the
work for multiple cores: From you. Your
application has to provide the software
threads or other parallel work to run on
those hardware threads. If it doesn’t, then
the core will be idle most of the time. So
this plan only works if the software is scal-
ably parallel.
Imagine for a moment that we live in a
different world, one that contains several
Figure 4: How much concurrency does your program need in order to exploit given hardware? major scalably parallel “killer” applications
— applications that a lot of mainstream
consumers want to use and that run better
on highly parallel hardware. If we have such
scalable parallel software, then the right-
hand side of Figure 3 is incredibly attractive
and a boon for everyone, including for end
users who get much more processing clout
as well as a smaller electricity bill.
In the medium term, it’s quite possible
that the future will hold something in be-
tween, as shown in the middle of Figure 3:
heterogeneous chips that contain both
large and small cores. Even these will only
be viable if there are scalable parallel ap-
plications, but they offer a nice migration
Figure 5: Extrapolation of “more of the same big cores” and “possible one-time switch to 4x
smaller cores plus 4x threads per core” (not counting some transistors being used for other path from today’s applications. The larger
things like on-chip GPUs). cores can run today’s applications at full
speed, with ongoing incremental improve-
In this simple example for illustrative • 4x cores = 4x FP performance: Each ments to sequential performance, while the
purposes only, the smaller cores are sim- small, simple core can perform just as smaller cores can run tomorrow’s applica-
pler cores that consume just one-quarter many floating-point operations per sec- tions with a reenabled “free lunch” of expo-
ond as a big, complex core. After all,
the number of transistors, so that four nential improvements to CPU-bound per-
we’re not changing the core execution
times as many can fit in the same area. logic (ALU, FPU, etc.); we’re only chang- formance (until the program becomes
However, they’re simpler because they’re ing the supporting machinery around it bound by some other factor, such as mem-
missing some of the machinery used to that hides the memory latency, to replace ory or network I/O). The larger cores can
hide memory latency; to make up the OoO and predictors and pipelines with also be useful for faster execution of any
some hardware threading.
deficit, the small cores also have to provide unavoidably sequential parts of new paral-
• Less total power: Each small, simple
four hardware threads per core. If CPU core occupies one-quarter of the tran- lel applications. [8]
vendors were to switch to this model, for sistors, but uses less than one-quarter
example, we would see a one-time jump of the total power. How Much Scalability Does
16 times the hardware concurrency — four Your Application Need?
times the number of cores, and at the same Who wouldn’t want a CPU that has So how much parallel scalability should
time four times as many hardware threads four times the total floating-point process- you aim to support in the application
per core — on top of the Moore’s Law- ing throughput and consumes less total you‘re working on today, assuming that it’s
based growth in Figure 2. power? If that’s possible, why not just ship compute-bound already or you can add
What makes smaller cores so appeal- it tomorrow? killer features that are compute-bound and
ing? In short, it turns out you can design a You might already have noticed the fly in also amenable to parallel execution? The
small-core device such that: the ointment. The key question is: Where answer is that you want to match your

DR. DOBB’S DIGEST 32 August 2009 www.ddj.com


D r. D o b b ’s D i g e s t

application’s scalability to the amount of ical new Dell desktop in 2012. We’re close the parallel libraries and tooling now
hardware parallelism in the target hardware enough to 2011 and 2012 that if chip ven- becoming available, I think such a com-
that will be available during your applica- dors aren’t already planning such a jump to plete failure is very unlikely.
tion’s expected production or shelf lifetime. simpler, hardware-threaded cores, it’s not As soon as mainstream parallel applica-
As shown in Figure 4, that equates to the going to happen. They typically need three tions become available, we will see hard-
number of hardware threads you expect to years or so of lead time to see, or at least ware parallelism both more and sooner
have on your end users’ machines. anticipate, the availability of parallel soft- than most people expect. Fasten your seat
Let’s say that YourCurrentApplication ware that will use the chips, so that they belts, and remember Figure 5.
1.0 will ship next year (mid-2010), and you can design and build and ship them in their
expect that it’ll be another 18 months until normal development cycle. References
you ship the 2.0 release (early 2012) and I don’t believe either the bottom line or [1] Montecito press release (Intel, July 2006)
probably another 18 months after that the top line is the exact truth, but as long www.intel.com/pressroom/archive/releases/
before most users will have upgraded (mid- as sufficient parallel-capable software 20060718comp.htm.
2013). Then you’d be interested in judging comes along, the truth will probably be [2] H. Sutter. “Machine Architecture: Things
what will be the likely mainstream hard- somewhere in between, especially if we Your Programming Language Never Told
ware target up to mid-2013. have processors that offer a mix of large- You” (Talk at NWCPP, September 2007).
If we stick with “just more of the same” and small-core chips, or that use some chip http://video.google.com/videoplay?docid=
as in Figure 2’s extrapolation, we’d expect real estate to bring GPUs or other devices 4714369049736584770
aggressive early hardware adopters to be on-die. That’s more hardware parallelism, [3] “Improving Performance by Disabling
running 16-core machines (possibly double and sooner, than most mainstream develop- Hyperthreading” (Novell Cool Solutions
that if they’re aggressive enough to run ers I’ve encountered expect. feature, October 2004). www.novell.com/
dual-CPU workstations with two sockets), Interestingly, though, we already noted coolsolutions/feature/637.html
and we’d likely expect most general main- two current examples: Sun’s Niagara, and [4] J. Stokes. “Introduction to Multi-
stream users to have 4-, 8- or maybe a smat- Intel’s Larrabee, already provide double- threading, Superthreading and Hyper-
tering of 16-core machines (accounting for digit parallelism in mainstream hardware threading” (Ars Technica, October 2002).
the time for new chips to be adopted in the via smaller cores with four or eight hard- http://arstechnica.com/old/content/2002/
marketplace). ware threads each. “Manycore” chips, or 10/hyperthreading.ars
But what if the gating factor, parallel- perhaps more correctly “manythread” [5] UltraSPARC T2 Processor (Sun).
ready software, goes away? Then CPU ven- chips, are just waiting to enter the main- www.sun.com/processors/UltraSPARC-T2/
dors would be free to take advantage of stream. Intel could have built a nice 100- datasheet.pdf
options like the one-time 16-fold hardware core part in 2006. The gating factor is the [6] L. Seiler et al. “Larrabee: A Many-Core
parallelism jump illustrated in Figure 3, software that can exploit the hardware par- x86 Architecture for Visual Computing”
and we get an envelope like that shown in allelism; that is, the gating factor is you (ACM Transactions on Graphics (27,3),
Figure 5. and me. Proceedings of ACM SIGGRAPH 2008,
Now, what amount of parallelism should August 2008). http://download.intel.com/
the application you’re working on now Summary technology/architecture-silicon/Siggraph_
have, if it ships next year and will be in the The pendulum has swung toward complex Larrabee_paper.pdf
market for three years? And what does that cores nearly far as it’s practical to go. [7] M. Abrash. “A First Look at the
answer imply for the scalability design and There’s a lot of performance and power Larrabee New Instructions” (Dr. Dobb’s,
testing you need to be doing now, and the incentive to ship simpler cores. But the April 2009). http://www.ddj.com/hpc-
hardware you want to be using at least part gating factor is software that can use them high-performance-computing/216402188.
of the time in your testing lab? (We can’t effectively; specifically, the availability of [8] H. Sutter. “Break Amdahl’s Law!” (Dr.
buy a machine with 32-core mainstream scalable parallel mainstream killer appli- Dobb’s Journal, February 2008). www.ddj
chip yet, but we can simulate one pretty cations. .com/cpp/205900309.
well by buying a machine with four eight- The only thing I can foresee that could
core chips, or eight quad-core chips… It’s prevent the widespread adoption of many- —Herb Sutter is a bestselling author and
no coincidence that in recent articles I’ve core mainstream systems in the next consultant on software development topics,
often shown performance data on a 24-core decade would be a complete failure to find and a software architect at Microsoft. He can
machine, which happens to be a four-socket and build some key parallel killer apps, be contacted at www.gotw.ca.
box with six cores per socket.) ones that large numbers of people want and
Note that I’m not predicting that we’ll that work better with lots of cores. Given
see 256-way hardware parallelism on a typ- our collective inventiveness, coupled with Return to Table of Contents

DR. DOBB’S DIGEST 33 August 2009 www.ddj.com

You might also like