You are on page 1of 7

Alex Walsh Dr.

Freymiller CAS137H October 29, 2013 The Paradigm Shift of Big Data As humans, we are constantly perceiving the world surrounding us and making decisions based upon these perceptions. Our desire for precision along with our inclination to find a link between seemingly unrelated factors has been studied in the world of psychology for centuries. These phenomena have shaped human cognition considerably; we search for exactitude and causality in everything ranging from scientific research and experiments to case studies and patterns in behavior. Certainly, the use of techniques such as the scientific method and analysis of representative sampling has allowed for momentous growth in the human knowledge base. However, the advent of the internet and increasingly cheaper processing power has allowed for the collection and aggregation of massive quantities of data that was previously unthinkable. Through effective utilization of this big data, our global culture has shifted its focus to what happens as opposed to why it happens; we have moved from exactitude and causation to probability and correlation. At first, the idea of our fixation on exactitude and causation is a difficult one to comprehend. Everything in the world is based on the fact that A happens because of B, right? The easiest way to demonstrate this fallacy is through a simple illustration. Just this year, Oxford professor Viktor Schnberger and economist Kenneth Cukier outlined numerous examples of our need to find causation in their meticulously researched book Big Data. They presented this logic: Take the following three sentences: Freds parents arrived late. The caterers were expected soon. Fred was angry. When reading them we instantly intuit why Fred was angry not because the caterers were to arrive soon, but because of his parents tardiness.

Actually, we have no way of knowing this from the information supplied. Still, our minds cannot help creating what we assume are coherent, causal stories out of facts we are given (63-64). Aside from this imaginary causation, our fervent desire for exactitude has been a driving force in academia, research and development. In a world of small data, reducing errors and ensuring high quality of data was a natural and essential impulse. Since we only collected a little information, we made sure that the figures we bothered to record were as accurate as possible (Schnberger, Cukier 32). This mindset has been customary for as long as humans have been around. The thought was, if we arent getting this exactly right, why are we doing it at all? Accordingly, advancements have transpired that now make it possible to measure the physical dimensions of a particle down to the atomic scale. This sort of precision, however, tends to cloud our view of the big picture. The rigid framework of exactitude we live by that this problem will only be solved using these specific steps; this part can be manufactured in this way only, etc. is less rigid than we may think, and embracing this elasticity will bring us closer to reality (Schnberger, Cukier 48). Utilization of mass data that is gathered and organized through the internet and computer processing is at the heart of the movement away from exactitude and causality, and it supports this plasticity of thought. The development of the internet is the first half of our cultural shift involving big data. In 1969, a government-funded project called ARPANET was conceived for solely scientific purposes. It was made so that massive calculations could be shared between different points on a network no human communication or other practical use was ever intended. What ARPANET did do, though, was represent the first major use of packet switching. A packet is a small segment of data sent from one computer/network device to another, containing the source,

destination, size, and data (Packet). This packet switching, or transfer of small bits of data to and from various points on an intangible network, is arguably the beginning of the internet. In 1978, the advent of TCP/IP (Transmission Control Protocol/Internet Protocol) established the basic language of how modern communication is carried out across a network (Peter). By 1990, ARPANET was nonexistent and the World Wide Web as we know it today was in place (Zakon). Between the early 1990s and now, the growth of internet application in society has effectively exploded. Online searches, shopping, social networking, and academic research are merely a few of the countless ways humans have taken advantage of this innovation. In specific relation to big data, the internet allows us to interact and collect data on a user-by-user basis that encompasses nearly the entire world. Before the internet, this sort of data collection was extremely difficult, costly, and ineffective. Schnberger and Cukier illustrate the beauty of harnessing the vastness of the web for the purpose of big data at the start of their book. In 2009, a new breed of flu virus named H1N1 turned up around the globe and caused fears of an epidemic along the lines of tuberculosis or small pox. Coincidentally, it was discovered a short time earlier that Google could predict the spread of the common winter flu in the United States, not just nationally, but down to specific regions and even states (1, 2). The search engine aggregated its most common queries that had anything to do with the disease, such as medicine for cough and fever and analyzed the location of the most closely set outbreaks using geographically-tagged IP addresses. Thus, Googles system proved to be a more useful and timely indicator than government statistics with their natural reporting lags (2). The fact that an internet-based corporation was essentially able to beat the United States Center for Disease Control at their own game is a testament to how the internet has meaningful impact by facilitating the use of big data. There was no precision in the

numbers Google came up with they were merely estimates that highlighted a correlation. Higher flu-related search terms correlated to a higher degree of H1N1 outbreak. This deviation from exactitude and causation is a clear indication that we have shifted our mindset for the purpose of producing beneficial insights. Although the internet has had a huge impact on this topic, it would be nowhere without the power of computer processing. The year 1954 denotes the first time computers were sold on a large scale, with IBM selling about 450 of its 650 magnetic drum calculator models (Timeline of Computer History). However, personal computing never took root until the mid-nineties when Microsoft and Apple began their PC versus Mac war that continues to this day. The reason computing power was able to take hold and have a significant influence on modern society was because of it becoming cheap and ubiquitous. More and more people were able to get into the once luxury-priced computer market, and Moores Law helped dictate this movement. Moores law is the prediction that the number of transistors (hence the processing power) that can be squeezed onto a silicon chip of a given size will double every 18 months. Stated by Gordon Moore (a cofounder of computer chip maker Intel and its former chairman) in 1965, it has proven to be amazingly accurate over the years (Moores Law). With the ever advancing world of computer processers comes the same breakthroughs in manufacturing technology to produce them, thus lowering the labor input and cost. Yet, the question remains: so what? Lightningquick computers and processors are what enable the aggregation and analysis of nearly infinite data sets. Google Flu Trends tunes its predictions on hundreds of millions of mathematical modeling exercises using billions of data points (Schnberger, Cukier 28). The sheer quantity of this data would be fundamentally impossible for humans to handle without computer processing power. The computers that are able to achieve such feats are a major contributor to the

paradigmatic shift of our culture towards the inexactitude and correlation that comes with mass data aggregation. Basing important decisions off of probability and correlation instead of precision and causation is increasingly more common with each passing day, and real-world examples are everywhere. Wal-Mart uses its transaction history to identify odd sales trends. With the help of statistical analysis firm Teradata, the retailer identified an enormous spike (approximately 700%) of strawberry Pop Tart and beer sales in the days leading up to a hurricane. In turn, Wal-Mart began heavily stocking the front of its stores in the path of a hurricane with these two items, and reaped the profit (Hays). By using individual transaction histories provided by discount cards, Target is able to predict with relative accuracy if a woman is pregnant even estimating her due date is possible. After analyzing trends in sales of products such as unscented lotion/soap, supplements, and extra-large quantities of cotton balls in known pregnant women (through the baby registry), Target mailed coupons to women their algorithms thought may be pregnant. As you can imagine, some people were quite perturbed. A 2012 New York Times Magazine article by staff writer Charles Duhigg details an astonishing story based around Targets new discovery. About a year after the creation of the pregnancy-prediction model, a man walked into a Target outside Minneapolis and demanded to see the manager. He was clutching coupons that had been sent to his daughter, and he was angry, according to an employee who participated in the conversation. My daughter got this in the mail! he said. Shes still in high school, and youre sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant? The manager apologized and then called a few days later to apologize again.

On the phone, though, the father was somewhat abashed. I had a talk with my daughter, he said. It turns out theres been some activities in my house I havent been completely aware of. Shes due in August. I owe you an apology. The potential of big data could not be any clearer than this. Vast quantities of numbers that are virtually useless to human eyes are literally a gold mine of profit if the right algorithms are applied. Everything from Amazons book recommendations to Facebooks People You May Know are all generated using this very same principle. Computer-driven statistical analysis of mass data is becoming more and more prevalent as society moves forward. Our marked progression of technology beginning with the internet and computers has spurred the movement in how we study the events surrounding us. Instead of relying on absolute precision and undeniable proof that one thing leads to another, we are starting to see the big picture; the probability and correlation of what happens is more significant than why it happens. With the help of the internet and computer processing, big data has instigated this change in cultural attitude, and it will undoubtedly continue to influence our future world.

Works Cited Duhigg, Charles. How Companies Learn Your Secrets. The New York Times Magazine. The New York Times Company, 16 Feb. 2012. Web. 28 Oct. 2013. Hays, Constance L. What Wal-Mart Knows About Customers Habits. The New York Times Company, 14 Nov. 2004. Web. 28 Oct. 2013. Moores Law. Web Finance, Inc., 2013. Web. 28 Oct. 2013. Packet. Computer Hope. Computer Hope, 2013. Web. 28 Oct. 2013. Peter, Ian. So, Who Really Did Invent the Internet? The Internet History Project, 2004. Web. 28 Oct. 2013. Schnberger, Viktor and Kenneth Cukier. Big Data. Boston: Houghton Mifflin Harcourt Publishing Company, 2013. Print. Timeline of Computer History. Computer History Museum, 2006. Web. 28 Oct. 2013. Zakon, Robert Hobbes. Hobbes Internet Timeline 10.2. Zakon Group LLC, Open Conf, 30 Dec. 2011. Web. 28 Oct. 2013.