Professional Documents
Culture Documents
Editor’s Note 2
by Jonathan Erickson
Techno-News 3
Bokode: A New Kind of Barcode
Tiny labels could pack lots of information, enable new uses.
Features
3 Steps To Managing Data In the Cloud 5
by Ken North
Matching the right cloud platform with the right database is critical.
Columns
Of Interest 26
Conversations 27
by Jonathan Erickson
Jon talks with MySQL creator Michael “Monty” Widenius about the future of databases.
Book Review 28
by Mike Riley
Mike reviews Using Google App Engine.
Effective Concurrency 29
by Herb Sutter
Herb urges you to design for manycore systems.
Entire contents Copyright© 2009, Techweb/United Business Media LLC, except where otherwise noted. No portion of this publication may be repro-
duced, stored, transmitted in any form, including computer retrieval, without written permission from the publisher. All Rights Reserved. Articles
express the opinion of the author and are not necessarily the opinion of the publisher. Published by Techweb, United Business Media Limited, 600
Harrison St., San Francisco, CA 94107 USA 415-947-6000.
D r. D o b b ’s D i g e s t [ ]
Editor’s Note
Database Development
F
rom e-science visualizations to Wall Street what-ifs, you know there’s a prob-
lem when the talk turns to exabytes. But that problem isn’t so much about too
much data as it is about making sense of the data at hand. In other words, it’s
a question of data management—the architectures, policies, practices, and
procedures you have in place for managing and enhancing a company’s data assets.
What really stands out are the vendors that are providing tools to manage and ana-
lyze what’s referred to as “big data.” There are the usual suspects: Oracle, IBM, Google,
Amazon.com, and FairCom. And then there are upstarts, such as Cloudera and Aster
Data Systems, that are leveraging open source software such as MapReduce and
Hadoop to build new businesses around big data.
Many of the technologies available to manage big data aren’t new. In one form or
another, column-oriented databases, data parallelism, solid-state drives, declarative
By Jonathan Erickson, programming languages, and cloud computing have been around for years. What’s new
Editor In Chief is the emergence of “fringe databases,” or database management systems that are
appearing where you least expect sophisticated data management. For example, med-
ical and consumer devices that once got by with flat files now require powerful data-
base engines to manage the sheer volume of data being collected.
None of this comes without a price. What with big data on the rise, transaction
throughput and concurrency requirements escalating, and data becoming more distrib-
uted, application complexity is increasing. To make it easier to manipulate data, it may
have to be partitioned across multiple files or replicated and synchronized across mul-
tiple sites. And, of course, software developers are looking at complex data schema par-
adigms to accommodate their needs while still maintaining traditional relational
access.
Hey, no one said it was going to be easy.
EDITORIAL
MANAGING EDITOR
Deirdre Blake
COPY EDITOR
Amy Stephens
CONTRIBUTING EDITORS
T
he ubiquitous barcodes found on The new concept is described in the paper John Slesinski
product packaging provide informa- “Bokode: Imperceptible Visual Tags for Camera-
DR. DOBB’S
tion to the scanner at the checkout based Interaction from a Distance”, written by 600 Harrison Street, 6th Floor, San
counter, but that’s about all they do. Ankit Mohan, Raskar, Grace Woo, Shinsaku Francisco, CA, 94107. 415-947-6000.
www.ddj.com
Now, researchers at the MIT’s Media Lab have Hiura, and Quinn Smithwick.
come up with a new kind of very tiny barcode The tiny labels are just 3 millimeters across UBM LLC
that could provide a variety of useful informa- — about the size of the @ symbol on a typical Pat Nohilly Senior Vice President,
tion to shoppers as they scan the shelves — and computer keyboard. Yet they can contain far Strategic Development and Business
could even lead to new devices for classroom more information than an ordinary barcode: Administration
Marie Myers Senior Vice President,
presentations, business meetings, video games, thousands of bits. Currently they require a lens Manufacturing
or motion-capture systems. and a built-in LED light source, but future ver-
The system, called Bokode (http://web.media sions could be made reflective, similar to the TechWeb
.mit.edu/~ankit/bokode/), is based on a new way holographic images now frequently found on Tony L. Uphoff Chief Executive Officer
of encoding visual information, explains MIT’s credit cards, which would be much cheaper and John Dennehy, CFO
Ramesh Raskar, who leads the lab’s Camera more unobtrusive. David Michael, CIO
John Siefert, Senior Vice President and
Culture group (http://cameraculture.media.mit “We’re trying to make it nearly invisible, Publisher, InformationWeek Business
.edu/). Until now, there have been three but at the same time easy to read with a stan- Technology Network
approaches to communicating data optically: dard camera, even a mobile phone camera,” Bob Evans Senior Vice President and
Content Director, InformationWeek
Mohan says. Global CIO
• Through ordinary imaging (using two- One of the advantages of the new labels is Joseph Braue Senior Vice President,
dimensional space) that unlike today’s barcodes, they can be “read” Light Reading Communications
• Through temporal variations such as a flash- Network
ing light or moving image (using the time from a distance — up to a few meters away. In Scott Vaughan Vice President,
dimension) addition, unlike the laser scanners required to Marketing Services
• Through variations in the wavelength of read today’s labels, these can be read using any John Ecke Vice President, Financial
light (used in fiberoptic systems to provide Technology Network
standard digital camera, such as those now built Beth Rivera Vice President, Human
multiple channels of information simultane- in to about a billion cellphones around the Resources
ously through a single fiber). Jill Thiry Publishing Director
world.
Fritz Nelson Executive Producer,
But the new system uses a whole new The name Bokode comes from the Japanese TechWeb TV
approach, encoding data in the angular dimen- photography term bokeh, which refers to the
sion: Rays of light coming from the new tags round blob produced in an out-of-focus image of
vary in brightness depending on the angle at a light source. The Bokode system uses an out-
which they emerge. “Almost no one seems to of-focus camera — which allows the angle-
have used” this method of encoding informa- encoded information to emerge from the result-
tion, Raskar says. “There have been three ways ing blurred spot — to record the encoded infor-
to encode information optically, and now we mation from the tiny tag. But in addition to
have a new one.” being readable by any ordinary camera (with the
communication applications
readable from a distance by a shopper scan- detract from the art or other exhibit, but
ning the supermarket shelves, and allow could send a whole host of background
easy product comparisons because several information to viewers through the use of
items near each other on the shelves could their cellphone cameras. Or a restaurant
all be scanned at once. could make its menu available to a passer-
In addition to conventional barcode by on the sidewalk.
applications, the team envisions some new It could also replace RFID systems in
kinds of uses for the new tags. For example, some near-field communication applica-
the tag could be in a tiny keychain-like tions, Mohan suggested. For example,
device held by the user, scanned by a cam- while RFIDs, now used in some ID cards,
era in the front of a room, to allow multiple can provide a great deal of information,
people to interact with a displayed image in that information can be read from a dis-
a classroom or a business presentation. The tance, even when the card is inside a wal-
camera could tell the identity of each per- let. That makes them inappropriate for
son pointing their device at the screen, as credit cards, for example, because the
well as exactly where they each were point- information could be retrieved by an unau-
ing. This could allow everyone in the room thorized observer. But Bokode could
to respond simultaneously to a quiz, and encode just as much information, but
the teacher to know instantly how many require an open line-of-sight to the card to
people, and which ones, got it right — and be read, increasing security.
thus know whether the group was getting The prototype devices produced at the
the point of the lesson. Media Lab currently cost about $5 each,
The devices could also be used for the most of that cost due to use of an off-the-
motion-capture systems used to create shelf convex glass lens, but Raskar says
videogames or computer-generated movie that price could easily drop to 5 cents once
scenes. Typically, video cameras record a they are produced even in volumes of a few
person or object’s motions using colored hundred units.
dots or balls attached to various parts of
the person’s body. The Bokode system
would allow the camera to record very pre-
cisely not just the position but the angle of Return to Table of Contents
3 Steps
To Managing Data In the Cloud
Matching the right cloud platform with the right database is critical
T
by Ken North he emergence of cloud computing rais- and authorization requirements. At a minimum,
es a host of questions about the best database communications and backups to the
database technology to use with this cloud need to be encrypted.
new model for on-demand computing. Security in cloud environments varies based
Ultimately, the cloud approach a company choos- on whether you use SaaS, a platform provider, or
es determines the data management options that an infrastructure provider. SaaS providers bundle
are available to it. tools, APIs, and services, so you don’t have to
When evaluating the suitability of a database worry about choosing the optimal data store and
manager for cloud computing, there are three security model. But if you create a private cloud
basic steps: or use an infrastructure provider, you’ll have to
select a data management tool that’s consistent
• Consider the class of applications that will be with your app’s security needs. Your database
served: data asset protection, business intelli-
gence, e-commerce, etc. decision also will hinge on whether the environ-
• Determine the suitability of these apps for ment supports a multitenant or multi-instance
public or private clouds. model. Salesforce.com hosts apps on Oracle data-
• Factor in ease of development. bases using multitenancy. Amazon EC2 supports
multi-instance security. If you fire up an Amazon
The database manager you choose should be a Machine Image running Oracle, DB2, or
function of the mission and the applications it Microsoft SQL Server, you have a unique instance
supports, and not based on budgets and whether that doesn’t serve other tenants. You have to
it will run in the enterprise as a private cloud or authorize database users, define roles, and grant
as a public cloud from a service provider. For user privileges when using the infrastructure-as-
instance, some companies turn to a cloud a-service model.
provider to back up mission-critical databases or
as a disaster recovery option. Database-intensive Developers’ Choices
apps such as business intelligence can be Database app development options for public
deployed in the cloud by having a SaaS provider cloud computing can be limited by the providers.
host the data and the app, an infrastructure SaaS offerings such as Google App Engine and
provider host a cloud-based app, or a combina- Force.com provide specific development plat-
tion of these approaches. And popular solutions forms with predefined APIs and data stores.
for processing very large datasets, such as Private cloud and infrastructure providers
Hadoop MapReduce, can run in both the private including GoGrid and Amazon EC2 let users
and public cloud. match the software, database environment, and
Databases, data stores, and data access soft- APIs to their needs. Besides cloud storage APIs,
ware should be evaluated for suitability for both developers can program to various APIs for data
public and private clouds. Public cloud security stores and standard ones for SQL/XML databas-
isn’t adequate for some types of applications. For es. Programmers can work with SQL APIs and
example, Amazon Dynamo was built to operate in APIs for cloud services. For Amazon, that
a trusted environment, without authentication involves using Web Services Description
C
by Ken North loud computing is the latest sea Cloud (EC2), GoGrid, Rackspace Mosso, and
change affecting how we develop and Joyent, whereas Microsoft Azure, Google
deploy services and applications and AppEngine, Force.com, Zoho, and Facebook are
fulfill the need for persistent informa- platform providers. There are also providers tar-
tion and database solutions. Database technology geting specific classes of cloud users, such as HP
evolves even as new computing models emerge, CloudPrint and IBM LotusLive for collaboration
inevitably raising questions about selecting the services and social networking for businesses.
right database technology to match the new Other SaaS providers include Birst and SAS for
requirements. on-demand business intelligence (BI),
The cloud is an elastic computing and data Salesforce.com and Zoho for customer relation-
storage engine, a virtual network of servers, stor- ship management (CRM), Epicor, NetSuite, SAP
age devices, and other computing resources. It’s a Business ByDesign, and Workday for enterprise
major milestone in on-demand or utility comput- resource planning (ERP) suites. The DaaS
ing, the evolutionary progeny of computer time- providers include EnterpriseDB, FathomDB,
sharing, high-performance networks, and grid Longjump, and TrackVia.
computing. The computer timesharing industry Private clouds, like server consolidation,
that emerged four decades ago pioneered the clusters, and virtualization, are another evolu-
model for on-demand computing and pay-per-use tionary step in data center and grid technology.
resource sharing of storage and applications. Gartner Research predicted government will have
More recently Ian Foster and Carl Kesselman the largest private clouds but any organization
advanced the concept of the grid to make large- having thousands of servers and massive storage
scale computing networks accessible via a service requirements is a likely candidate. Security and
model. Like computer timesharing and the grid, reliability are the appeal of private clouds for
cloud computing often requires persistent storage large enterprises that can afford the infrastruc-
so open source projects and commercial compa- ture. Public cloud computing does not provide
nies have responded with data store and database the 99.99% uptime that enterprise data center
solutions. managers desire for service-level agreements. The
Public clouds include commercial enterprises fact that a private cloud sits behind a firewall mit-
that can host applications and databases, offering igates the risk from exposing data to the cloud.
Software as a Service (Saas), Platform as a Service The private cloud also alleviates concerns about
(PaaS), Infrastructure as a Service (IaaS), and data protection in multitenancy cloud environ-
Database as a Service (DaaS). Infrastructure ments. One issue in the private versus public
providers include Amazon Elastic Compute cloud debate is the diversity of APIs used to
F
by Gastón Hillar an (http://www.fandev.org/), another new the possibility to tackle multicore, developers can
programming language developed in the transform a slow performance language into a fast
multicore era, has recently launched its one. Undoubtedly, JavaScript’s support is a great
1.0.45 release (http://code.google.com/p/ opportunity for developers to create higher per-
fan/downloads/list). It is a very active open source formance RIAs (Rich Internet Applications) based
project with an interesting approach to many mod- on this popular scripting language.
ern concurrent programming challenges. It’s a bit difficult to find a definition for Fan,
I began writing about Fan 1.0.44 a week ago. because it tries to offer many different features in
Now, Fan has a new version, 1.0.45. one single and simple language. In a few lines,
Most developers don’t want to learn a new this list offers a summary of Fan’s main features:
programming language. However, Fan is an
attractive language for certain tasks because it is • Portability
• Object-oriented (with inheritance support)
trying to solve modern problems related to con- • Immutability
currency and multicore programming. • Closures support
Fan is both an object-oriented and functional • Dynamic programming
programming language. This means that a develop- • Functional programming
er can combine functional programming code with • Serialization support
• Actor framework
object-oriented code. However, at the same time, it
has built-in immutability, message passing, and The actor framework is really powerful. It
REST-oriented transactional memory. It uses Java- supports the most important features required to
style syntax. Therefore, Java and C# developers create concurrent code without problems:
won’t have problems understanding Fan code.
Its portability makes Fan unique. So far, it can • Actor locals
• Actor pools (using a shared thread pool)
run on the JVM (Java Virtual Machine), the .NET • Futures
CLR (Common Language Runtime), and • Timers
JavaScript. Its JavaScript support is one of the • Chaining
most exciting features I’ve found for this lan- • Message passing
guage. There are many other languages and • Coalescing messages
• Flow control mechanisms
libraries offering actors and many different con-
currency models for the JVM and the .NET CLR. If you need functional programming,
However, Fan’s support for JavaScript could revo- immutability, message passing, and actors, you
lution the scripting performance. Scripting could should take a look at Fan. Likewise, if you are
take advantage of multicore. working with JavaScript, keep an eye on Fan. It
In fact, scripting should take advantage of can help developers to tackle multicore.
multicore. Fan is evolving to offer JavaScript In the forthcoming months, expect to see new
developers the possibility to tackle concurrency libraries, languages, compilers, and Domain-
using actors and message-passing features. Specific Languages appearing to simplify parallel
Besides, you can also run it on the JVM or on the programming for many languages, virtual
.NET CLR. However, you can expect Fan to offer machines, and runtimes.
additional compilers to run on new platforms. — Gastón Hillar is the author of C# 2008 and
Portability is very important for Fan. 2005 Threaded Programming: Beginner’s Guide.
Fan creators talk about the productivity of
Ruby with the performance of Java. As Fan offers Return to Table of Contents
The C++0x
“Remove Concepts” Decision
“Concepts”were to have been the central new feature in C++0x
A
By Bjarne Stroustrup t the July 2009 meeting in Frankfurt, For example, see (listed in chronological
Germany, the C++ Standards order):
Committee voted to remove “con-
cepts” from C++0x. Although this • Bjarne Stroustrup and Gabriel Dos Reis:
“Concepts — Design choices for template
was a big disappointment for those of us who argument checking.” October 2003. An early
have worked on concepts for years and are aware discussion of design criteria for “concepts”
of their potential, the removal fortunately will not for C++.
directly affect most C++ programmers. C++0x • Bjarne Stroustrup: “Concept checking — A
will still be a significantly more expressive and more abstract complement to type checking.”
October 2003. A discussion of models of
effective language for real-world software devel- “concept” checking.
opment than C++98. The committee acted with • Bjarne Stroustrup and Gabriel Dos Reis: “A
the intent to limit risk and preserve schedule. concept design (Rev. 1).” April 2005. An
Maybe a significantly improved version of “con- attempt to synthesize a “concept” design
cepts” will be available in five years. This article based on (among other sources) N1510,
N1522, and N1536.
explains the reasons for the removal of “con- • Jeremy Siek, Douglas Gregor, Ronald Garcia,
cepts,” briefly outlines the controversy and fears Jeremiah Willcock, Jaakko Jarvi, and Andrew
that caused the committee to decide the way it Lumsdaine: “Concepts for C++0x.”
did, gives references for people who would like to N1758==05-0018. May 2005.
explore “concepts,” and points out that (despite • Gabriel Dos Reis and Bjarne Stroustrup:
“Specifying C++ Concepts.” POPL06. January
enthusiastic rumors to the contrary) “the sky is 2006.
not falling” on C++. • D. Gregor, B. Stroustrup: “Concepts.”
N2042==06-0012. June 2006. The basis for
No “Concepts” in C++0x all further “concepts” work for C++0x.
At the July 2009 Frankfurt meeting of the ISO • Douglas Gregor, Jaakko Jarvi, Jeremy Siek,
Bjarne Stroustrup, Gabriel Dos Reis, Andrew
C++ Standards Committee (WG21) Lumsdaine: “Concepts: Linguistic Support for
(http://www.open-std.org/jtc1/sc22/wg21/), the Generic Programming in C++.” OOPSLA’06,
“concepts” mechanism for specifying require- October 2006. An academic paper on the
ments for template arguments was “decoupled” C++0x design and its experimental compiler
(my less-diplomatic phrase was “yanked out”). “ConceptGCC.”
• Pre-Frankfurt working paper (with “con-
That is, “concepts” will not be in C++0x or its cepts” in the language and standard library):
standard library. That — in my opinion — is a http://www.open-std.org/jtc1/sc22/wg21/
major setback for C++, but not a disaster; and docs/papers/2009/n2914.pdf. N2914=09-
some alternatives were even worse. 0104. June 2009.
I have worked on “concepts” for more than • B. Stroustrup: “Simplifying the use of con-
cepts.” N2906=09-0096. June 2009.
seven years and looked at the problems they
aim to solve much longer than that. Many It need not be emphasized that I and others
have worked on “concepts” for almost as long. are quite disappointed. The fact that some
B
By Gigi Sayfan uild systems are often a messy set of menting a build system to replace the existing
scripts and configuration files that let one, which was a nasty combination of Makefiles
you build, test, package, deliver, and and Perl scripts that actually worked but nobody
install your code. As a developer, you was sure why (the original author had left the
either love or loathe build systems. In this article building). There were a few bugs (for example,
series, I present a different approach to build sys- the build system didn’t always follow the proper
tems, with the ultimate goal of completely hiding dependency path) and a big requirements docu-
the build system from developers. But first, let me ment. Clearly it would be impossible to evolve
start with some personal history. the current build system, so I had to create a new
Early in my programming career I was a pure one from scratch. This was lucky because I had
Windows developer (with the exception of my zero experience with Makefiles and Perl, coupled
very first job, where I wrote Cobol programs for with the tolerance threshold of a Windows devel-
publishing Australia’s Yellow Pages). While there oper to gnarly stuff. I still have the same toler-
was no build system to speak of, there was Visual ance, but I now know something about Makefiles.
Studio and Visual SourceSafe. I built Windows Some of the requirements were pretty unusu-
GUI clients, messed around with COM compo- al, like running a commercial code generator that
nents, and picked up some nice C++ template produces code from UML diagrams on a
tricks from ATL. And because automated unit Windows machine, then uses the artifacts to com-
testing wasn’t very common back then, we creat- pile code on Linux, Solaris, and LynxOS. The bot-
ed various test programs before passing code on tom line is that I decided to take an unusual
to QA. This wasn’t too painful since I worked for approach and wrote the entire system in Python.
a small startup company and the projects weren’t It was my first big Python project and I was real-
too big. ly surprised at how well it went. I managed every-
But I then moved to a company that developed thing in Python. I directly invoked the compiler
software for chip fabrication equipment in the and linker on each platform, then the test pro-
semi-conductor industry and BOOM! Life-critical grams, and finally a few other steps. For instance,
and mission-critical real-time software running I implemented friendly error messages that pro-
on six computers that controlled custom-built vided helpful suggestions for common errors
hardware in clean-room conditions. The software (e.g., “FrobNex file not found. Did you remember
ran on several operating systems with about 50 to configure the FrobNex factory to save the
developers contributing code. The development file?”).
environment consisted of two machines running While I was generally pleased with the sys-
Linux and Windows/Cygwin. The deployment tem, it wasn’t completely satisfactory. In lieu of
environment was Solaris and LynxOS RTOS. No Makefiles, I created build.xml files, a la Ant. That
more Visual Studio. After reading about 1000 was a mistake. The XML files were verbose com-
pages of documentation in the first week and get- pared to Makefiles, big chunks were identical for
ting my .profile and .bashrc in order, I was many subprojects, and people had to learn the
assigned my first task — designing and imple- format (which was simple, but something new). I
wrote a script that migrated Makefiles to build system because every change a devel- since Microsoft Bob (http://en.wikipedia
build.xml files, but it just increased code oper makes must trigger at least a partial .org/wiki/Microsoft_Bob). However, they
bloat. I created a custom build system with- build. When I say “developer” I don’t neces- still don’t always live up to their potential:
out regard for the specific environment and sarily mean a software engineer. I could be
its needs. I created a very generic system, referring to a graphic artist, technical writer, • They Don’t Do Enough (Not Fully
with polymorphic tools that can do any- or any other person that creates source con- Automated). This is one of the most
common problems. A build system that
thing as long as you write the code for the tent. When a build fails, it’s most often is not fully automated can compile the
tool and configure it properly. This was because a developer changed something that software, create documentation, and
bad. Whenever someone says, “You just broke the build. On rare occasions, it would package the final binary, but it requires
have to...” I know I’m in trouble. What I be an administrator action (e.g., changing a lot of user intervention to run various
took away from this experience is that the URL of a staging server or shutting down scripts, wait for previous stages to fin-
ish, check error reports, and so on.
Python is a terrific language. It’s really fun some test server) or a hardware problem • Requires a Lot of Discipline to Use
when you can actually debug the build sys- (e.g., source control server is down). A good Properly. Some build systems fail inex-
tem itself. Having full control over the build system saves time by automating plicably if you don’t follow a slew of
build system is great, too. tedious and error-prone activities. obscure steps, like logging into the test
Think about a developer manually server with a specific user, removing
directory A, renaming directory B, mak-
Background: What Does a building and unit testing a program. ing sure you perform step X only if the
Build System Do? Without a build system, he has to very report generated by step Y says okay.
The build system is the software develop- carefully build it properly, test it, and hand • Requires Too Much Configuration.
ment engine. Software development is a it over to QA. The QA person needs to run Some build systems are very powerful
complex activity that involves tasks such his own tests, then hand it to the adminis- and flexible, but are almost unusable
due to excessive configuration. You
as: source-code control, code generation, trator for deployment to a staging site, have to define six different environ-
automated source code checks, documenta- where more tests are run against the ment variables, modify three local con-
tion generation, compilation, linking, unit deployed system. If anything goes wrong in fig files, and pass eight different com-
testing, integration testing, packaging, cre- this process, someone must determine mand-line options to the main build
ating binary releases, source-code releases, what happened. Automated build systems script. The end result is that 99% of the
users use a single default configuration
deployment, and reports. That said, soft- eliminate a whole class of errors. They that probably doesn’t fit their needs.
ware development usually boils down to never forget a step and they can pinpoint • Caters Mainly To a Sole Stakeholder.
four main phases: and resolve other errors by verifying that Another common problem is that a
the source artifacts and intermediate arti- build system is often suitable for just
1. Developers write source code and content facts are available and by scanning through one kind of stakeholder. For example,
if the build system was developed
(graphics, templates, text, etc.) log files and detecting failures. mainly by the programmers who com-
2. The source artifacts are transformed to end Managers can also benefit from build pile, link, and unit test all day, then the
products (binary executables, web sites, in- systems. A passing build is the pulse of a build system will have good support
stallers, generated documents) project. If you have an automated build sys- for these activities, but running inte-
3. The end products are tested tem with good test coverage (at the system gration tests or generating documenta-
tion may be poorly supported, if at all.
4. The end products are deployed or distributed level), managers can monitor project On the other hand, if the build system
progress and be ready to release at each was developed mainly by a release
A good automated build system can point. This in turn enables more agile devel- engineering team, then it will have
take care of steps 2–4. The distribution/ opment practices (if you are so inclined). good support for packaging final exe-
deployment phase is usually to a local A build system can even help users in cutables and will generate good reports
about the percentage of passing test,
repository or a staging area. You will prob- some cases. Think about systems that but it may not be possible for develop-
ably need some amount of human testing incorporate user-generated content and/or ers to run just a single unit test and its
before actually releasing the code to pro- plug-ins. In most cases, you need to go dependencies, and they will either
duction. The build system can also help over the content and ensure it doesn’t have to run the full-fledged build every
with that by notifying users about interest- break your system. A build system that time or hack the build system in a
quick and dirty way (which might lead
ing events, such as successful and/or failed automates some/all of these checks allows to errors).
builds and providing debugging support. for shorter publish/release cycles for user- • Intractable Error Messages When
But really, who cares about all this stuff? generated content. Something Is Wrong. Build systems
Actually everybody — developers, adminis- perform many activities that involve
trators, QA, managers, and even users. The Build System Problems external tools. The errors generated by
these tools are often swallowed by the
developers interact most closely with the Okay, build systems are the greatest thing build system that much later generates
its own error message, which doesn’t “Convention over configuration” is a dled together) are the simplest. They
point to the root cause. This is a seri- principle that has successfully governed in are later linked into dynamic libraries
ous problem that hurts productivity and executables. Dynamic libraries and
and causes people to revert to manual domains like web frameworks, reducing executables are similar from a build
but understandable build practices. the learning curve and increasing develop- point of view. They both have source
• Inextensible and Undebuggable. er productivity. It demands that you organ- files and depend on precompiled static
Franken-CodeBuild systems are often ize your project in a consistent way (which libraries to link against. It is important
one of the earliest tools created at proj- is always good practice in any event): to build the dependent dynamic
libraries and executables after building
all the required static libraries. Many
libraries (both static and dynamic) and
executables use the same set of compil-
The perfect build system solves or minimizes the problems er and linker flags. Placing these
groups under a parent directory
informs the build system of these com-
associated with existing build systems mon flags and automatically builds all
the subprojects.
• Generate build files from templates
for any IDE. Different IDEs, as well as
command-line based tools like Make,
ect initiation. The requirements of this • Regular directory structure. This is use different build files to represent
early build system are usually minimal. the key principle on which the entire the meta information needed to build
As time goes by and the project grows, build system rests. Even in the most the software. The build system I pres-
the demands from the build system complicated systems, there is usually a ent here maintains the same informa-
grow too. Since the build system is an relatively small high-level directory tion via its inherent knowledge com-
internal tool, less effort is dedicated to structure that contains a potentially bined with the regular directory struc-
making it high quality code. More often huge number of similar directories. For ture and can generate build files for
than not, it is just a bunch of scripts example, a project may have a libs any other build system by populating
slapped together and extended to sup- directory that contains all the C++ stat- the appropriate templates. This
port additional requirements by the ic libraries. The contents of the libs approach lets developers build the soft-
tried and true practice of copy and directory may grow and change, but it ware via their favorite IDE (like Visual
paste. Such build systems quickly always contains a single type of entities. Studio) without the hassle involved in
become a maintenance nightmare and • Well-known locations. The build sys- adding files, setting dependencies, and
can’t be extended easily to accommo- tem should be aware of the location specifying compiler and linker flags.
date new requirements. and names of the top-level directories • Automatic dependency management
• Not Integrated With Developer’s IDE. and “understand” what they mean. For based on #include analysis. Managing
Most build systems that don’t come example, it should know that the direc- dependencies can be simple or compli-
with an IDE built-in don’t support tories under libs generate static cated depending on the project. In any
IDEs. They are command-line based libraries that should later be linked case, missing a dependency leads to
only and if a developer wants to work into executables and dynamic libraries linking errors that are often hard to
in an IDE, the IDE project files must be that depend on them. resolve. This build system analyzes
maintained and synchronized with the • Automatic discovery of files based on the #include statements in the source
build system build files. For example, extension. Each directory usually con- files and recursively creates a com-
the build system may be Makefile- tains a small number of file types. plete dependencies tree. The depend-
based, and a developer that uses Visual Again, in the libs example, it should encies tree determines what static
Studio has to maintain a .vcproj file for contain .h and .c/.cpp files and poten- libraries a dynamic library or exe-
each project, and any additional files tially a couple of other metadata files. cutable needs to link against.
must be added to the Makefile as well. The build system should know what • Automatic discovery of added/re-
files to expect and how to handle each moved/renamed files and directories.
The Perfect Build System file type. Once you have the regular The regular directory structure, com-
directory structure in place, the build bined with knowledge of files types
The build system I present in this series (e.g., .cpp or .h files), allows the build
system “knows” a lot about your sys-
is open ended and can be used to auto- tem and can do many tasks on your system to figure out what files it needs
mate any software process that is mainly behalf automatically. In particular, it to take into account, so developers just
file-based. However, the focus is on a doesn’t need in a build file in each need to make sure the right files are in
cross-platform build system for large- directory that tells it what files are in the right directory.
it, how to build them, etc. • Flexibility
scale C++ projects because these are often Support static libraries, dynamic
• Capitalize on the small variety of
the most complicated to build. The per- subproject types. In the C/C++ world, libraries, executables, and custom arti-
fect build system solves or minimizes the there are really only three types of sub- facts. All possible build artifacts are
problems associated with existing build projects: a static library, a dynamic supported including custom ones like
systems. library, and an executable. Static code generators, preprocessors, and
libraries (a compiled set of files bun- documentation generators. The ability
subproject type. and even Ruby bindings. The project will |___ src
Control the level of error messages. run on Windows, Linux, and Mac OS X. It |___apps
The build system is designed to sup- will be built using a custom build system. |___bindings
port different users, such as QA, To whet your appetite, here is a prototype |___dlls
developers, and managers. Each type
of user may be interested in different in Python of the finished project: |___hw (static libraries)
Integrating jQuery
Client-Side Data Templates
With WCF
Using client-side templates and binding JSON data that’s retrieved from a WCF service
I
By Dan Wahlin n my article “Minimize Code by Using jQuery and Data Templates” (http://www.ddj.com/win-
dows/217701311), I presented a JavaScript data binding template solution that I’ve been using to
make it easy to bind JSON data to a client-side template without having to write a lot of JavaScript
code. In this article, I demonstrate the fundamentals of using the client-side templates and bind-
ing JSON data that’s retrieved from a WCF service. As a review (in case you didn’t read the previous
article), the template solution I’ve been using recently on a client project is based on some code writ-
ten by John Resig (creator of jQuery), which is extremely compact. Here’s a modified version of his
original code that I wrapped with a jQuery extender:
$.fn.parseTemplate = function(data)
{
var str = (this).html();
var _tmplCache = {}
var err = "";
try
{
var func = _tmplCache[str];
if (!func)
{
var strFunc =
"var p=[],print=function(){p.push.apply(p,arguments);};" +
"with(obj){p.push('" +
str.replace(/[\r\t\n]/g, " ")
.replace(/'(?=[^#]*#>)/g, "\t")
.split("'").join("\\'")
.split("\t").join("'")
.replace(/<#=(.+?)#>/g, "',$1,'")
.split("<#").join("');")
.split("#>").join("p.push('")
+ "');}return p.join('');";
//alert(strFunc);
func = new Function("obj", strFunc);
_tmplCache[str] = func;
}
return func(data);
} catch (e) { err = e.message; }
return "< # ERROR: " + err.toString() + " # >";
}
The parseTemplate method can be applied against a client-side html method of the target div, which displays the data in the browser.
template like the one below. Notice that the template is wrapped in Note: I’m defining the d property in the JSON object because WCF
a script block with the type set to text/html so that it isn’t rendered uses that name by default when it returns serialized JSON data.
by the browser. JSON properties are written out by using the var json =
<#= #> syntax and the template engine has full support for embed- {
"d":
ded JavaScript code. [
{ "FirstName": "John", "LastName": "Doe",
"Address":
<script id=”MyTemplate” type=”text/html”> { "Street": "1234 Anywhere St.", "City": "Phoenix",
<table style=”width:400px;”> "State": "AZ", "Zip": 85044 }
<thead> },
<tr> { "FirstName": "Jane", "LastName": "Doe",
<th>First Name</th> "Address":
<th>Last Name</th> { "Street": "435 Main St.", "City": "Tempe",
<th>Address</th> "State": "AZ", "Zip": 85245 }
</tr> },
</thead> { "FirstName": "Johnny", "LastName": "Doe",
<tbody> "Address":
<# { "Street": "643 Chandler Blvd", "City": "Chandler",
for(var i=0; i < d.length; i++) "State": "AZ", "Zip": 85248 }
{ },
var cust = d[i]; { "FirstName": "Dave", "LastName": "Doe",
#> "Address":
<tr> { "Street": "18765 Cloud St.", "City": "Mesa",
<td id=”CustomerRow_<#= i.toString() #>”> "State": "AZ", "Zip": 85669 }
<#= cust.FirstName #> }
</td> ]
<td> };
<#= cust.LastName #> var output = $('#MyTemplate').parseTemplate(json);
</td> $('#MyTemplateOutput').html(output);
<td>
<#= cust.Address.Street #>
<br />
<#= cust.Address.City #>,
<#= cust.Address.State #> Of course, in the real-world you’ll probably get the JSON data
<#= cust.Address.Zip #>
</td> from some type of service (WCF, ASMX, REST, etc.). Here’s a WCF
</tr>
<# service that returns a List of Customer objects and converts them
}
#>
to JSON. The service has the client script behavior enabled so that
</tbody> serialization from CLR objects to JSON objects occurs behind the
</table>
<br /> scenes automatically.
<#= d.length #> records shown
</script>
[ServiceContract(Namespace = "http://www.thewahlingroup.com")]
[AspNetCompatibilityRequirements(RequirementsMode =
This template outputs a simple table like Figure 1. Sure, I could AspNetCompatibilityRequirementsMode.Allowed)]
public class CustomerService
have generated the table using DOM manipulation techniques, but {
[OperationContract]
being able to tweak a data template is much easier and productive public List<Customer> GetCustomers()
in my opinion. {
return new List<Customer>
To use the template you’ll need to have some JSON data available. {
new Customer {FirstName="John",LastName="Doe",
Here’s an example of creating JSON by hand and binding it to the tem- Address=
plate using the parseTemplate method shown earlier. The data new Address{
Street="1234 Anywhere St.",
returned from the template data binding operation is passed to the City="Phoenix",State="AZ", Zip=85044}},
new Customer {FirstName="Jane",LastName="Doe",
Address=
new Address{
Street="435 Main St.",
City="Tempe",State="AZ", Zip=85245}},
new Customer {FirstName="Johnny",LastName="Doe",
Address=
new Address{
Street="643 Chandler Blvd",
City="Chandler",State="AZ", Zip=85248}},
new Customer {FirstName="Dave",LastName="Doe",
Address=
new Address{
Street="18765 Cloud St.",
City="Mesa",State="AZ", Zip=85669}}
};
}
}
jQuery’s ajax method can then be used to call the WCF service
and retrieve the data (jQuery provides other methods such as
Figure 1 getJSON that could be used too if desired):
$.ajax(
{ bind method to highlight rows as the user moves the mouse in and
type: "POST",
url: "CustomerService.svc/GetCustomers", out of them.
dataType: "json",
data: {}, You can see that the amount of custom JavaScript that has to be
contentType: "application/json; charset=utf-8", written is kept to a minimum by combining jQuery with the client-
success: function(json)
{ side template, which ultimately leads to easier maintenance down
var output = $('#MyTemplate').parseTemplate(json);
$('#MyTemplateOutput').html(output); the road. This is just one of several different client-side template
//Add hover capabilities
solutions out there. ASP.NET 4.0 will also include a custom client-
$('tbody > tr').bind('mouseenter mouseleave', function() side template solution as well once released. You can download the
{
$(this).toggleClass('hover'); sample code here (http://i.cmpnet.com/ddj/images/article/
});
} 2009/code/jQueryDataTemplates.zip).
});
Of Interest
JetBrains has released Version 1.0 of MPS (short for “Meta Three Key Challenges to Adding Parallelism to Your Applications
Programming System”), a language workbench and IDE for extend-
ing existing languages and creating custom Domain Specific
Languages. By using MPS and DSLs created with its help, domain
experts can solve domain-specific tasks easily, even if they’re not
familiar with programming. MPS is freely available, with a major
part of its source code open and available under the Apache
license, and can be downloaded bug-tracking system, code-named
Charisma, is developed entirely with MPS. This issue tracker is a
modern Web 2.0 application. To create it, a whole stack of web
application languages was created: languages for HTML templates,
controllers, database access, JavaScript, etc. MPS doesn’t use any
parsers. It works with the abstract syntax tree directly, so it does-
n’t require any parsing. Compiler construction knowledge might CLICK SCREEN TO LAUNCH VIDEO ABOUT 3 KEYS TO MULTICORE
be useful, but you don’t have to be an expert in this field in order For more videos on this topic, go to www.ddj.com//go-parallel/
to use MPS: it contains a predefined set of languages with which
users can create their own languages. http://www.jetbrains Static analyzers try to find weaknesses in other programs that could
.com/mps/?mps1pr be triggered accidentally or exploited by intruders. A report from
the National Institute of Standards and Technology (NIST) enti-
Intel has made available for free download Prototype Edition tled “Static Analysis Tool Exposition (SATE), “edited by Vadim
3.0 of the Intel C++ STM Compiler. (STM is short for “Software Okun, Romain Gaucher, and Paul Black, documents NIST’s Static
Transactional Memory.”) The Transactional Memory C++ lan- Analysis Tool Exposition —an exercise by NIST and static analyzer
guage constructs that are included open the door for users to vendors to improve the performance of these tools. The static ana-
exercise the new language constructs for parallel programming, lyzers (and languages) in the study included Aspect Security ASC
understand the transaction memory programming model, and 2.0 (Java), Checkmarx CxSuite 2.4.3 (Java), Flawfinder 1.27 (C),
provide feedback on the usefulness of these extensions with Intel Fortify SCA 5.0.0.0267 (C, Java), Grammatech CodeSonar 3.0p0
C++ STM Compiler Prototype Edition. This posting includes the (C), HP DevInspect 5.0.5612.0 (Java), SofCheck Inspector for Java
Intel C++ STM Compiler Prototype Edition 2.0 and runtime 2.1.2 (Java), University of Maryland FindBugs 1.3.1 (Java), and
libraries for Intel transactional memory language construct Veracode SecurityReview (C, Java). According to NIST’s Vadim
extensions. http://software.intel.com/en-us/articles/intel-c- Okun, SATE was a long-overdue idea. “Most modern software is too
stm-compiler-prototype-edition-20/ lengthy and complex to analyze by hand,” says Okun. “Additionally,
programs that would have been considered secure ten years ago
TeamDev has released Selenium Inspector, an open-source may now be vulnerable to hackers. We’re trying to focus on identi-
library that runs on top of the Selenium, a tool designed to sim- fying what in a program’s code might be exploitable.” While the
plify automated testing of Web components, pages and applica- SATE 2008 process was not designed to compare the performance
tions — especially those written using JSF. The Selenium of participating tools, it was successful in understanding some of
Inspector API lets you create testing solutions for variety of their capabilities in a wide variety of weaknesses. SATE demon-
HTML rendering frameworks like JSF component libraries, Spring strated that results from multiple tools can be combined into a sin-
MVC, and Struts. Web developers can create object-oriented test- gle database from which further analysis is possible. While the
ing APIs for any Web UI library. The Java API for inspecting backtrace explanations were useful, the study concluded that the
OpenFaces components is already included. Selenium Inspector evaluation might have been more efficient and less error-prone by
provides an API similar to that of Selenium, but is simpler to use closely integrating with the navigation and visualization capabilities
in many cases and provides a bit higher level of abstraction. It of the tools. The SATE report is available at http://samate.nist
doesn’t replace Selenium, but provides an additional API that you .gov/docs/NIST_Special_Publication_500-279.pdf
can use if you find it more appropriate for your actual needs. You
can use both Selenium and Selenium Inspector APIs at the same
time. http://seleniuminspector.org/ Return to Table of Contents
M
ichael is very easy to embed in web-based applications.
by Jonathan Erickson “Monty” As long as people are developing web pages with
Widenius programming languages like PHP, Perl, Ruby, and
was the Java, SQL will have it’s place.
creator of the MySQL
database, and founder of Dr. Dobb’s: What will the biggest change in data
Monty Program Ab. He storage in five years?
recently spoke with Dr. Widenius: SSD (solid-state drive) memory
Dobb’s editor-in-chief will force a paradigm shift in how data is
Jonathan Erickson stored and accessed and a lot of old proven
database algorithms have to be changed
Dr. Dobb’s: What’s the Open Database Alliance? because there is no seek time anymore.
Widenius: The Open Database Alliance is a ven-
dor neutral consortium of vendors and individu- Dr. Dobb’s: What’s the most exciting develop-
als commercially supporting or delivering serv- ment in DBMS technology today?
ices around MariaDB and MySQL. Open Widenius: : On the software side, the usage of
Database Alliance partners will support each Memcached and Gearman to do inexpensive
other’s open source initiatives, and resell each “cloud like” computing is of course interesting.
other’s services. This makes it possible for cus- We are also seeing dedicated inexpensive
tomers to get all services they require around machines that provides Memcached interfaces
their database issues through any vendor in the which will notable speed up and simplify any
Alliance. setup that uses Memcached (which is a standard
component for most busy web sites).
Dr. Dobb’s: What’s MariaDB?
Widenius: It’s a community developed branch of Dr. Dobb’s: Will operating systems ultimately be
MySQL with bug fixes and new features devel- successful in converting their filesystems into
oped by the MariaDB community, of which Monty SQL-managed organizations of data?
Program Ab is an active member. We will keep Widenius: I think that is a stupid idea. Most data
MariaDB in sync with MySQL development to people store is not suitable really suitable for SQL.
ensure all bug fixes and features in MySQL also SQL will only notable slow things down when
exists in MariaDB. At this time MariaDB 5.1 accessing things and will create a lot more fragmen-
should be notable faster, have more features and tation compared to modern file systems without
have fewer bugs than the corresponding MySQL providing anything really critical for the end user.
5.1 release. Another problem is that SQL managed data is very
bad of application that wants to have their own
Dr. Dobb’s: Is SQL adequate for 21st century access to the part of the data (like another database
computing? server running on a SQL managed filesystem).
Widenius: Yes. SQL will be around for a long
time because it’s a very expressive language that Return to Table of Contents
E
ven though Google App Engine has been available to a GAE-friendly application and understanding and fine-tuning the
developers for some time, deep technical books on GAE memory cache parameters. Four appendixes, one each ded-
applying this scalable cloud service have been sparse. icated to the target development OS (Windows XP, Windows
Does O’Reilly’s entry, under their Google Press imprint, Vista, Mac OSX and Linux) literally repeat the same information
fill this void? Read on to find out. with the name of the OS replaced and other minor differences.
Like many web-connected developers, I have been aware of These were quite frankly a waste of paper; the author should have
Google App Engine (GAE) since its invitation-only beta days, but consolidated the OS variations into a simple grid or footnotes
never really took much interest in it. I am a big fan of the Python where appropriate. That would have left more space for explain-
scripting language, but the fact that GAE uses Python as its pre- ing the inner workings, design optimizations and best practices
ferred logic language somehow failed to grab me. One of the main for developing best of breed GAE designs.
reasons for this was at the same time as GAE’s initial public beta, Besides the minimal amount of unique material in the book, one
I was busy immersing myself in the Python-based Django frame- of its biggest failings for me was presenting me with a convincing
work and I wasn’t about to confuse myself with an alternative argument to use GAE in the first place. The advantages mentioned
approach to Python-centric web application development. by the author read like a Google advertisement from a Google fan-
Fortunately, GAE was constructed with enough flexibility to allow boy. The author failed to share any well-known websites that run on
a framework like Django to live within its constructs, as detailed GAE, interview others who are as enamored with GAE as he is, pro-
in this April 2008 article by Googler Damon Kohler (http:// vide a chapter or appendex on answering privacy and intellectual
code.google.com/appengine/articles/django.html). Additionally, property right concerns, service level expectations, etc.. While the
with the inclusion of Java support, GAE offers plenty of flexibility book did a fair job at elevating my interest in GAE, it wasn’t enough
for the developer seeking a hosted cloud solution. Unfortunately, for me to consider placing any of my web apps into Google’s cloud.
author Charles Severance failed to explore either of these impor- Overall, O’Reilly (and Google for that matter) missed a golden
tant features in Using Google App Engine. The book is instead ori- opportunity with this book to deeply explore the technical facets
ented toward first-time web programmers unfamiliar with even the of GAE. Instead, they spent paper and ink on rehashing basic con-
most rudimentary aspects of web development. Nearly half the cepts that have been much better served in other O’Reilly titles.
book is spent on the basics of HTML, CSS, HTTP and basic Python While this might be a helpful book for someone who has no web
syntax. Considering the book’s brevity and cost, this expenditure development experience whatsoever, yet aspires toward under-
left few pages solely dedicated to the GAE. standing MVC patterns and a broader grasp of the complexity
Once the beginner tutorials of basic web page construction associated with modern day web applications, it’s a major letdown
and delivery are out of the way, the second half of the book dives for practiced developers seeking deeper understanding of GAE in
into a high-level overview of the GAE, its use of templates (based the real world. Perhaps O’Reilly will revisit this technology again
on Django’s template system, no less), handling cookies and ses- under a “Mastering Google App Engine” title.
sion management, using the proprietary GAE Datastore for struc-
tured data storage, creating a GAE account, uploading and testing Return to Table of Contents
D
By Herb Sutter ual- and quad-core computers are trends over the past 40 years. These numbers
obviously here to stay for mainstream come from Intel’s product line, but every CPU
desktops and notebooks. But do we vendor from servers (e.g., Sparc) to mobile
really need to think about “many- devices (e.g., ARM) shows similar curves, just
core” systems if we’re building a typical main- shifted slightly left or right. The key point is that
stream application right now? I find that, to many Moore’s Law is still generously delivering transis-
developers, “many-core” systems still feel fairly tors at the rate of twice as many per inch or per
remote, and not an immediate issue to think dollar every couple of years. Of course, any expo-
about as they’re working on their current product. nential growth curve must end, and so eventually
This column is about why it’s time right now will Moore’s Law, but it seems to have yet anoth-
for most of us to think about systems with lots of er decade or so of life left.
cores. In short: Software is the (only) gating fac- Mainstream microprocessor designers used to
tor; as that gate falls, hardware parallelism is be able to use their growing transistor budgets to
coming more and sooner than many people yet make single-threaded code faster by making the
believe. chips more complex, such as by adding out-of-
order (“OoO”) execution, pipelining, branch pre-
Recap: What “Everybody Knows” diction, speculation, and other techniques.
Figure 1 is the canonical “free lunch is over” slide Unfortunately, those techniques have now been
showing major mainstream microprocessor largely mined out. But CPU designers are still
Figure 1: Canonical “free lunch is over” slide. Note Pentium vs. dual-core Itanium transistor counts.
single-threaded applications run faster, is But that doesn’t mean a simpler 1997- cycles; it’s nothing remotely similar to the
that the complexity is used to hide the style core can’t make sense today. You just cost of an operating system-level context
latency of accessing glacially slow RAM — have to provide enough internal hardware switch. For example, a core with four hard-
the “memory wall.” concurrency to hide the memory wall. The ware threads can run the first thread until it
In general, how do you hide latency? squeezing-the-toothpaste-tube metaphor encounters a memory operation that forces
Briefly, by adding concurrency: Pipelining, applies directly: When you squeeze to it to wait, and then keep doing useful work
out-of-order execution, and most of the make one end smaller, some other part of by immediately switching to the second
other tricks used inside complex CPUs the tube has to get bigger. If we take away thread and executing that until it also has to
inject various forms of concurrency within some of a modern core’s concurrency- wait, and then switching to the third until it
the chip itself, and that lets the CPU keep providing complexity, such as removing also waits, and then the fourth until it also
the pipeline to memory full and well-uti- out-of-order execution or some or all waits — and by then hopefully the first or
lized and hide much of the latency of wait- pipeline stages, we need to provide the second is ready to run again and the core
ing for RAM. (That’s a very brief summary. missing concurrency in some other way. can stay busy. For more details, see [4].
For more, see my machine architecture But how? A popular answer is: Through The next question is, How many hard-
talk, available on Google video. [2]) hardware threads. (Don’t stop reading if ware threads should there be per core? The
So every chip needs to have a certain you’ve been burned by hardware threads in answer is: As many as you need to hide the
amount of concurrency available inside it the past. See the sidebar “Hardware latency no longer hidden by other means.
to hide the memory wall. In 2006, the threads are important, but only for simpler In practice, popular answers are four and
memory wall was higher than in 1997; so cores.”) eight hardware threads per core. For exam-
naturally, 2006 cores of any variety needed ple, Sun’s Niagara 1 and Niagara 2 proces-
to contain more total concurrency than in Toward Simpler, sors are based on simpler cores, and pro-
1997, in whatever form, just to avoid Threaded Cores vide four and eight hardware threads per
spending most of their time waiting for What are hardware threads all about? Here’s core, respectively. The UltraSPARC T2
memory. If we just brought the 1997 core the idea: Each core still has just one basic boasts 8 cores of 8 threads each, or 64
as-is into the 2006 world, running at 2006 processing unit (arithmetic unit, floating hardware threads, as well as other func-
clock speeds, we would find that it would point unit, etc.) but can keep multiple tions including networking and I/O that
spend most of its time doing something threads of execution “hot” and ready to make it a “system on a chip.” [5] Intel’s
fairly unmotivating: just idling, waiting for switch to quickly as others stall waiting for new line of Larrabee chips is expected to
memory. memory. The switching cost is just a few range from 8 to 80 (eighty) x86-compatible
cores, each with four or more hardware
threads, for a total of 32 to 320 or more
hardware threads per CPU chip. [6] [7]
Figure 3 shows a simplified view of pos-
sible CPU directions. The large cores are
big, modern, complex cores with gobs of
out-of-order execution, branch prediction,
and so on.
The left side of Figure 3 shows one pos-
sible future: We could just use Moore’s
transistor generosity to ship more of the
same — complex modern cores as we’re
used to in the mainstream today. Following
that route gives us the projection we
already saw in Figure 2.
But that’s only one possible future,
because there’s more to the story. The right
side of Figure 3 illustrates how chip ven-
dors could swing the pendulum partway
back and make moderately simpler chips,
along the lines that Sun’s Niagara and
Figure 3: A few possible future directions. Intel’s Larrabee processors are doing.
application’s scalability to the amount of ical new Dell desktop in 2012. We’re close the parallel libraries and tooling now
hardware parallelism in the target hardware enough to 2011 and 2012 that if chip ven- becoming available, I think such a com-
that will be available during your applica- dors aren’t already planning such a jump to plete failure is very unlikely.
tion’s expected production or shelf lifetime. simpler, hardware-threaded cores, it’s not As soon as mainstream parallel applica-
As shown in Figure 4, that equates to the going to happen. They typically need three tions become available, we will see hard-
number of hardware threads you expect to years or so of lead time to see, or at least ware parallelism both more and sooner
have on your end users’ machines. anticipate, the availability of parallel soft- than most people expect. Fasten your seat
Let’s say that YourCurrentApplication ware that will use the chips, so that they belts, and remember Figure 5.
1.0 will ship next year (mid-2010), and you can design and build and ship them in their
expect that it’ll be another 18 months until normal development cycle. References
you ship the 2.0 release (early 2012) and I don’t believe either the bottom line or [1] Montecito press release (Intel, July 2006)
probably another 18 months after that the top line is the exact truth, but as long www.intel.com/pressroom/archive/releases/
before most users will have upgraded (mid- as sufficient parallel-capable software 20060718comp.htm.
2013). Then you’d be interested in judging comes along, the truth will probably be [2] H. Sutter. “Machine Architecture: Things
what will be the likely mainstream hard- somewhere in between, especially if we Your Programming Language Never Told
ware target up to mid-2013. have processors that offer a mix of large- You” (Talk at NWCPP, September 2007).
If we stick with “just more of the same” and small-core chips, or that use some chip http://video.google.com/videoplay?docid=
as in Figure 2’s extrapolation, we’d expect real estate to bring GPUs or other devices 4714369049736584770
aggressive early hardware adopters to be on-die. That’s more hardware parallelism, [3] “Improving Performance by Disabling
running 16-core machines (possibly double and sooner, than most mainstream develop- Hyperthreading” (Novell Cool Solutions
that if they’re aggressive enough to run ers I’ve encountered expect. feature, October 2004). www.novell.com/
dual-CPU workstations with two sockets), Interestingly, though, we already noted coolsolutions/feature/637.html
and we’d likely expect most general main- two current examples: Sun’s Niagara, and [4] J. Stokes. “Introduction to Multi-
stream users to have 4-, 8- or maybe a smat- Intel’s Larrabee, already provide double- threading, Superthreading and Hyper-
tering of 16-core machines (accounting for digit parallelism in mainstream hardware threading” (Ars Technica, October 2002).
the time for new chips to be adopted in the via smaller cores with four or eight hard- http://arstechnica.com/old/content/2002/
marketplace). ware threads each. “Manycore” chips, or 10/hyperthreading.ars
But what if the gating factor, parallel- perhaps more correctly “manythread” [5] UltraSPARC T2 Processor (Sun).
ready software, goes away? Then CPU ven- chips, are just waiting to enter the main- www.sun.com/processors/UltraSPARC-T2/
dors would be free to take advantage of stream. Intel could have built a nice 100- datasheet.pdf
options like the one-time 16-fold hardware core part in 2006. The gating factor is the [6] L. Seiler et al. “Larrabee: A Many-Core
parallelism jump illustrated in Figure 3, software that can exploit the hardware par- x86 Architecture for Visual Computing”
and we get an envelope like that shown in allelism; that is, the gating factor is you (ACM Transactions on Graphics (27,3),
Figure 5. and me. Proceedings of ACM SIGGRAPH 2008,
Now, what amount of parallelism should August 2008). http://download.intel.com/
the application you’re working on now Summary technology/architecture-silicon/Siggraph_
have, if it ships next year and will be in the The pendulum has swung toward complex Larrabee_paper.pdf
market for three years? And what does that cores nearly far as it’s practical to go. [7] M. Abrash. “A First Look at the
answer imply for the scalability design and There’s a lot of performance and power Larrabee New Instructions” (Dr. Dobb’s,
testing you need to be doing now, and the incentive to ship simpler cores. But the April 2009). http://www.ddj.com/hpc-
hardware you want to be using at least part gating factor is software that can use them high-performance-computing/216402188.
of the time in your testing lab? (We can’t effectively; specifically, the availability of [8] H. Sutter. “Break Amdahl’s Law!” (Dr.
buy a machine with 32-core mainstream scalable parallel mainstream killer appli- Dobb’s Journal, February 2008). www.ddj
chip yet, but we can simulate one pretty cations. .com/cpp/205900309.
well by buying a machine with four eight- The only thing I can foresee that could
core chips, or eight quad-core chips… It’s prevent the widespread adoption of many- —Herb Sutter is a bestselling author and
no coincidence that in recent articles I’ve core mainstream systems in the next consultant on software development topics,
often shown performance data on a 24-core decade would be a complete failure to find and a software architect at Microsoft. He can
machine, which happens to be a four-socket and build some key parallel killer apps, be contacted at www.gotw.ca.
box with six cores per socket.) ones that large numbers of people want and
Note that I’m not predicting that we’ll that work better with lots of cores. Given
see 256-way hardware parallelism on a typ- our collective inventiveness, coupled with Return to Table of Contents