Literature by the Numbers

“Literature is the opposite of data,” wrote novelist Stephen Marche in the Los Angeles Times Review of Books in October 2012. He cited his favorite line from Shakespeare’s Macbeth: “Light thickens, and the crows make wing to the rooky wood.” Marche went on to ask, “What is the difference between a crow and a rook? Nothing. What does it mean that light thickens? Who knows?” Although the words work, they make no sense as pure data, according to Marche.

There are many who would disagree with him. With the rise of digital technologies, the paramount role of human intuition and interpretation in humanistic knowledge is being challenged as never before, and the scientific method is tiptoeing into the English department. Some humanists are eagerly adopting these new tools, while others find them problematic. The rapid ascent of the digital humanities is spurring pitched debate over what it means for the profession, and whether the attempt to quantify something as elusive as human intuition is simply misguided.

Today, huge amounts of the world’s literature have been digitized and are accessible to scholars with the click of a mouse. Simple embellishments on keyword search can yield fascinating insights on this data. Take, for example, Google’s N-gram server, which debuted with a splash in 2011. The server allows you to track the frequency of words or word combinations (“bigrams,” “trigrams,” or “N-grams”) in the Google Books database over time. You can see, for instance, how words change meaning. Until 1965, “black” was just a color, occurring about as often as “red” and quite a bit

