You are on page 1of 1

Living on Pleasant Hill Time

Neither aught nor naught: Secret of Big Data

July 17, 2013

Preface
Fifteen years ago today, Bonnie and I entered Paris by train to all the great excitement, congestion and gendarmes carrying Uzis. In honor of Bastille Day, Le Marseilles scene from perhaps the most famous film of all time, Casablanca. Today, I am dealing with Big Dataforty years and counting.

July 14, 2013, Bastille Day (Libert, Fraternit, Egalit)


In the mid 1970s, I processed several 9-track EBCDIC tapes from the US Agriculture Department with the HANES (Health And Nutrition Evaluation Study) data. HANES I was the dietary recall longitudinal history stratified sample study of peoples food habits. I used PL/1 to process the tapes because I could specify guarded commands (ON construct) and have them execute semi-automatically. In this way I could accommodate missing data easily, distinguish the Naught or Null value. This was handled differently from Zero which was the default value of a Missing Value if one did not trap and process it differently. In the mid 80s Bonnie became a Nutrient Data Base Researcher for Campbell Soup and became a SAS expert at using the HANES II data. I left Data Analysis of food consumption to her until, during my 1996-99 financial systems employ interregnum, from Oct 1998 to March 1999, I converted a Campbell Soup DOS/BASIC-based PC app called MENUSCAN into an interactive Web app.

The Secret of Big Data Analysis


And, then, as now, the secret of analyzing this data is handling that which was Neither aught nor naught. One may mean, Neither aught nand naught but nand is a not an English, but the Boolean construct. Not (aught or naught) is the specific meaning, so it is some non-zero number. This is, as my friend Clarke maintains, Separating the pepper from the fly specks. Only he is a bit more colorful in his language. The point is handling the data requires understanding the full range of possible data values as well as no data at all. Zero is not always the absolute value 0 but can be the middle point of any distribution of data. Median is the best example. Like the median 2011 income in the US being $26,364: just as many above as below that income. [And now I go to last weeks news of Farm Bills and Food Stamps. Tsk!] Median is much more descriptive than Average since the Uber Rich skew the Average up to $51,560. Imagine, the 0.1% (300,000 people) of the population receiving the difference between Median and Average, viz., $25,000 X 12 300,000,000 = $7.5 X 10 or $7.50 Trillion. This equates to about $69 Million each on average for the Uber Rich (since they received about 90% of increase in income). This is not wealth, but annual income. The Wealth is, well, Uber. Let them eat cake! How much is enough?

Copyright 2013, David M. Sherr

Annals of a Running Dog

Page |1

You might also like