You are on page 1of 101

Table of Contents


A Game Plan for Success in Data Analy tics
What this book is NOT about
The Data Ninja Defined
Ninja Tip 1.
The Ideal Data Ninja Candidate
List of companies and industries that hire Data Ninjas
Whats in for y ou?
Dazzling facts about the current growth of data
Some More Data Analy tics Facts and Data Trends
From Data to Wisdom and every thing in between
The Different States of Data and Information
Ninja Tip 2.
Ninja Tip 3.
Its Your Business to Know Your Business
Ninja Tip 4.
Required Skillsets for the Data Ninja
Ninja Tip 5.
Ninja Tip 6.
Tools are Important, but Not the End-All-Be-All
Training for Data Ninja Candidates
Training of Some Sort is Crucial for Success
Ninja Tip 7.

The Future of Power BI
Ninja Tip 8.
Ninja Tip 9.
SQL Portability
Data Manipulation Language (DML)
Data Definition Language (DDL)
Data Control Language (DCL) and Others
Transaction Control Language (TCL)
Ninja Tip 10.
Data Warehousing Defined
Dimensional Model Pros ad Cons
Why is Dimensional Modeling Beneficial to the Data Ninja?
Ninja Tip 11.
What Experts Say About the Mastery of Programming Skills
Work y our way up the programming ladder
Practice Makes Perfect
Programming Success


Why is Predictive Analy tics Important?
Ninja Tip 12.
Stuff to Blow Your Mind
Acrony ms
Glossary of Definitions
Relevant Quotes Glossary
About the Author



A Game Plan for Success in Data Analytics

2015 by Fru N. All rights reserved.
No part of this book may be reproduced in any written, electronic, recording, or
photocopying form without written permission of the author, Fru Nde.
Every effort has been made in the preparation of this book to ensure the accuracy of the
information presented. However, the information contained in this books is sold without
warranty, either express or implied. Neither the author, nor the publishing company, and
its dealer and distributors will be held liable for any damages caused or alleged to be
caused directly or indirectly by this book.
First Edition
Printed in the, United States of America

In Africa, there is proverb that says, It takes a village to raise a child. I have
certainly been raised in good hands by a village, by my family, friends and by anyone
whose paths have crossed with mine.
I wish to dedicate this book to everyone that is dear to me.
Even more so, I wish to dedicate this book to my mom, dad, and siblings who have
nurtured and stood steadfastly beside me every step of the way.
Thank you for your love, support and continued guidance.
Fru N.



The world today is awash with data. Companies have and are investing great amount
of resources collecting and storing vast amounts of data. With this data sitting on
companies data farms, there is now a great need to employ Data Professionals (a.k.a
Data Ninjas) who can come in and make sense of this data.
I recently had the privilege of being invited along with some of my peers to speak with a
group of students pursuing a Masters degree in Business Analytics and Big Data at the
University Of Minnesota Carlson School Of Management in Minnesota.
The discussion was about Analytics and options for students in the Master Program
looking to make a career out of working with data. The discussions we had in the panel
session was very lively and engaging; and the enthusiasm the students expressed
reinforced my appreciation of how vibrant the field Data Analytics is and how
important it is for those who want to get into the field to have a game plan for success.
After further discussing the topic of what it takes to be a data professional with peers,
what skills are required to start, and what steps beginners can take to get into the field,
our conversation precipitated thoughts which eventually led to the writing of this book
on those topics.
This book, as a whole, is NOT intended to be a technical book. Even though some
technical concepts will be discussed, you will not, for example, learn how to use Excel
by reading this book. Instead, resources will be provided at the end of each chapter to
help you follow up and do a more in-depth study of the proposed topics.
Within this book, I will make the case for why you should learn Excel. The same
approach applies with other concepts such as SQL, Data warehousing, ETL and
Programming. The list of resources provided at the end of each chapter are intended for

you to then follow up and do more in-depth study of the concepts on your own.


This book was created as an overview of some of the most common and basic tools
used in Data Analytics today. As such, it is geared toward new users who are looking
to get high level summaries of the concepts, and more especially, users who are looking
for resources and pointers to get started. For more advanced readers, the resource links
that have been included at the end of each chapter can still be very valuable as a point
of reference or as an avenue to find other new resources that can help polish up on
certain specific areas.
We hope the content of this book, along with the references and resource links
provided, will help and encourage newcomers to get a better understanding of some of
the skills required for Data Analytics. We also hope that the material will guide and
encourage you to take that next step in your career either by securing a new job,
making a career change, getting more skilled or just going on to pursuing that promotion
you have been yearning for in the field of Data Analytics.
Having had a solid background helping companies transform their data asset into
information, I have been fortunate over my career to work with several small, midsized
and Fortune 500 companies.
In such engagements, I have leveraged tools as simple as MS Excel spreadsheets on
the one end, to more advanced tools like SQL and Programming on the other end in
order to build complex data integration, data warehousing and data analytics solutions
for businesses. In all these experiences, I have been witness to both the good and bad
sides of working with data. But, the common and most promising lesson I have learned
is that, the processes and skills needed to get in the game are not that complicated.
With a bit of effort and dedication, anybody should be able to get in and excel in the
field of data analytics. My hope in this guide is to leverage my experiences gained from
working in the trenches analyzing data, to inform readers and provide a simple 5 point
game plan for success.


This book offers a survey of the Data Analytics landscape (for beginners) and
presents a slice of the tools and the skillsets that can help individuals transform data
into information. This would include discussing concepts and tools like MS Excel, SQL,
Data Warehousing, Programming and Change.
After reading, you would be provided with empowering resources (at the end of each
chapter) that can help you gain the skills necessary to take on the role of analyzing data

What this book is NOT about

Being a good data professional is about using technology to solve business problems.
As you start and mature in the space of data analytics, you may find yourself talking to
individuals or working for companies who try to push you into learning technology X
because they think it is superior to Y, or vice versa. Or they may say technology A is
dead in favor of technology B.
Some might even say Why Excel? or Why SQL? My tool is better. Choose mine!
Of course, the point of this book is not to sell you on any individual technology or
vendor, but to expose you to the fundamental industry trends and concepts as far as data
analytics is concerned. The specific technology tool you end up using really isnt the
A good analogy I have seen to describe this is the good old fishing parable with a
slight twist.
In this book, I will teach you about fishing, why you should learn how to fish for
yourself, and provide resources for places where you can go learn specific fishing
techniques. But I will not teach you how to use a particular fishing rod. That is up to you
to learn, using books, links, videos and all the other resources that have been provided
here or that available online, at the bookstore, or in libraries.
Data Ninja is the term that will informally be used throughout this book to refer to any
individual (or group of individuals) who aspire to work intimately with data and make a
career in the analytics field either as Data Analyst, Report Writers, Business
Intelligence, Data Architects, Data Engineers, Data Scientists, etc.
The Data Ninja Defined
Because the possibilities for defining and classifying Data Professionals can be very
broad, and the required skillsets for each job role can vary widely, in this book, the
term Data Ninja would be referring to:
An entry level, unspecialized individual that works within a structured environment
(usually within a company or team), employing a variety of tools and performs a variety
of tasks related to collecting, organizing, and interpreting data to gain useful
Ninja Tip 1.
As a data ninja, you would be expected to play the role of the multi skilled, multi
talented individual that helps companies quickly and effieciently transform their data
assets into information.

The Ideal Data Ninja Candidate

The definition of the Data Ninja provided above is extremely important because it
would help guide the scope and breadth of content covered in the ensuing chapters of
this book.
Because there are many different levels of expertise required by data professionals,
ranging from the non-technical persons in the business world to PhD wielding Math and
CompSci geniuses, it makes sense to get a clear definition of the Data Ninja so as to
narrowly guide the scope and content of this book.
It is understandable that some readers, though Data Professionals themselves, may be
too advanced for the concepts presented herein. But we aim to stick to the basics and
target those readers who possess some or all of the following skillsets:
Non-technical, but not scared of a few technical details here and there.
Have little or no experience with data.
Are excited about the prospects of working with data.
Just getting into the data analytics field.
Are looking to land their first job or make a career change.
Are looking for opportunities to progress in their current carriers.
Are curious and just looking to expand their fundamental knowledge about the
If either one, all, or some of the above criteria fits you, then this book is for you - and
you are the ideal Data Ninja this book is targeted towards.
The process of data analysis undertaken by Data Ninjas usually is to turn some form
of raw data into meaningful information. In this regard, we see that Data Ninjas are like
data interpreters of the company and it is critical that they get the analysis done right.
A company that employs Data Professionals who do not perform their jobs well
would be akin to someone having a wrong interpreter on their side a point that is
devastatingly exemplified by the interpreter who performed interpretive services for
dignitaries at Nelson Mandelas funeral ceremony in South Africa, back in 2013.
Some have accused the interpreter of being a fraud and have described him as
"waving his hands around but there was no meaning" or even describing the situation as
"childish hand gestures and clapping, it was as if he had never learnt a word of sign
language in his life".
I do not speak sign language, but if I did, I would be seriously offended by someone
who stood up and pretended, or was blatantly incompetent of performing necessary

interpretive services. The biggest losers in a situation like that are the honest listeners in
the crowd who depend on such individuals to get information about what the speaker is

A sign language interpreter during a memorial in honor of Nelson Mandela.

Similar to language interpreters, data professionals are the ones at the forefront
tasked with interpreting their companys data and turning it into consumable
information. It is critical that they do the job right, and provide decision makers with the
information they need.
Having a good data interpreter working for a company would ensure that reports of
daily sales numbers are correct, the forecast of website traffic volumes are reasonable,
the analysis of a marketing campaign effort is effective, and so on.
On the other hand, the consequences of having a bad data interpreter who cannot
competently perform the task of turning data into information and answering the
companys analytical questions can lead to situations that can be very undesirable to the
companys bottom line. Sales numbers would be reported wrongly, customer sentiments
wont be analyzed, financial forecasts would be off, and so on.
Having a wrong data interpreter on your side not only confuses the companys
decision makers, but might proactively lead them astray as was the case with the
interpreter at Nelson Mandelas Funeral who left the crowd and millions of viewers
around the world baffled and bewildered by the subpar job he did.
You can have data without information, but you cannot have information without
data." - Daniel Keys Moran

Its fair to assume that most readers want to master some data analytics skills in order
to get hired. So, before we get into the nuts and bolts of the Data Ninja game plan, we
will first take a look at what the hiring outlook looks like for Data Professionals.
Most reputable research firms in the industry are predicting a significant upsurge in
hiring of data ninja-type professionals as the amount of computing web and digital data
grows. These data professionals in demand consist of people trained in collecting,
storing, interpreting and making inferences based on data.
Because of the extremely high demands of these professionals, some have even gone
as far as calling professionals who work with data (a.k.a Data Ninjas) the sexiest job
of the 21st century.
Back as 2009, IBM in press release expressed interest in the need to open a global
network of Advanced Analytics Centers. The expectation was to use these worldwide
centers to retrain or hire up to 4,000 additional data professionals.
ARMONK, N.Y. - 28 Apr 2009: IBM(NYSE: IBM) today announced a significant
expansion of its capabilities around business analytics with plans to open a network
of Analytics Solution Centers around the world, beginning with five in the second
quarter of 2009. These initial centers will be located in Tokyo, London, New York
City, Beijing and Washington, D.C. As part of this initiative, IBM will retrain or hire
as many as 4,000 new analytics consultants and professionals."
Furthermore, an article by Allison Stadd goes for further to illustrate the need and
demand for data professionals.
Its not just the skills needed, its also the raw manpower. In a survey by Robert Half
Technology of 1,400 U.S.-based CIOs, 53% of the respondents whose companies are
actively gathering data said they lacked sufficient staff to access that data and extract
insights from it. Translation: you are sorely needed.
From the hiring outlook above, we see that the field for professionals who work with
Data is booming and as such, the hiring outlook numbers are very promising. In addition
to the hiring demand and shortage of labor expressed above, we also see the job
postings for data professionals remaining steadily high.
In a recent search I did on job site, there were over 102,305 job

positions currently listed with Data Analyst tags.

Job listings from

Also, a similar search done on career site came up with very impressive
numbers for Data analytics related job openings. More than 40,000 jobs were listed in a
search on their database.

Data analyst job search from

These numbers all seem to indicate a very strong outlook in the jobs market for data
professionals. The numbers also underscore the point that data is the way of the future
and that there is great need for skilled professionals who aspire to make a career out of
working with data.
Aspiring data professionals usually seek to understand what industries or verticals
Data Ninjas work in. The simple answer to that is All Industries and Verticals.

Data is everywhere, and today we see Data Ninjas working part-time, full-time and
in a contractual basis in a wide variety of industries. This ranges from non-profit
organizations, government and education, to healthcare, retail, high-tech, finance, ecommerce, and consumer products. See the list below.
List of companies and industries that hire Data Ninjas
Construction companies
Utility companies
Oil, gas and mining companies
Hospitals and healthcare organizations
Colleges and universities
Federal, provincial/state and municipal government departments
Transportation companies
Telecommunications companies
Insurance, finance and banking organizations
Management consulting companies
Manufacturing companies
Data Ninjas can be tasked with performing varying roles within different industries
or organizations; some of which can include:
Web Data Analysis
Data Ninjas in this domain focus on Analyzing website data and logs. It may even be
extended to include site polls, survey results, web traffic and web usage, etc... The goal
of such analysis helps companies better understand and develop strategies for
optimizing web usage and more.
Financial Data Analysis
Data Ninjas in this area focus on studying the companys financial statements and
analyzing the companys current and projected value or earnings. They have visibility to
a lot of the companys numbers including sales, profit and loss, operations, and even
meet with company officials to gain a better insight into the organizations so that they
can predict future trends and identify potential opportunities.
CRM Data Analysis
Data Ninjas in this domain focus on analyzing the companys customers. In many
companies, customer data is usually stored in some sort of a Customer Relationship
Management (CRM) system. The analysts job is to mine that system to better
understand patterns and relationships within available data. The result of such analysis

can help the companys Sales and Marketing departments to spot under-served markets,
identify best customers, fine-tune advertising campaigns or predict new markets.
Marketing / Sales Data Analysis
Data Ninjas in this domain typically focus on analyzing data sets used by sales and
marketing teams. Such analysis usually involve identifying, modeling, understanding and
predicting sales trends and outcomes while aiding sales management in understanding
where salespeople can improve. The analyzed data can help provide insights into how
the companys marketing efforts are performing, how campaigns are effective and which
sales channels are most valuable.
Health Data Analysis
Data Ninjas in this domain typically focus on analyzing healthcare data. They might
work on multiple projects related to health services delivery, healthcare costs, quality
of care, insurance coverage, and access to care. The Health Data Analyst works with
healthcare information from a variety of sources, including medical and pharmacy
claims data, hospital inpatient/outpatient data, quality measurement data, and other
resources. They use software and statistics to provide data support, solve issues,
perform research and improve information quality and accuracy.
In addition to the five listed above, there are scores of other areas which Data
Ninjas can specialize in. But despite the different specializations, the common thread
that runs through all of them is the outcomes and impacts their analysis work can have
on the business.
Such impacts can range from finding out how to reduce costs, increase sales, reduce
customer churn; to determining what product to sell or how many people should be
scheduled to work on the Saturday shift.
Most would agree that a vast majority of people would not do their jobs if they
weren't paid for it. Because of this, salary consideration becomes a very important point
of concern at the minds of most professionals, including aspiring or practicing Data
As with many other jobs out there, pay or salary often depends heavily on skill level,
years of experience, along with a host of other factors. But the good news is that, in
most cases, individuals can rise on the pay scale to the far limits of where their talent
takes them.
Show me the money!
If you do a quick search on the internet, you quickly would find a lot of numbers
regarding the pay of Data Professionals. Some of the figures are lower than others, but
according to the talent firm DataJobs, national salary ranges for a few data related jobs

titles are as follows:

Data analyst, Entry Level
Data analyst, Experienced
Data Scientist
Database administrator, Entry Level
Database Administrator, Experienced
Data Engineer, Junior
Data Engineer, Domain Expert
The salary figures presented above look very promising, especially given that according
to the U.S. Census Bureau the U.S. real (inflation adjusted) median household income
was $51,939 in 2013. That number was a rise from where it was in 2012 at $51,759.
(U.S. Census Bureau). So, the pay ranges for Data related roles is on average
significantly higher than the median income numbers reported by the U.S. Census
To further substantiate this pay and income numbers, we see the Technology
consulting firm, Robert Half Technologies (RHT) came up with their 2015 publication
of Salary ranges for data professionals.

Salary range for data professional

Again, these numbers look very lucrative. Especially, given that a quick analysis of the
salaries reveals an above +5% average NET increase in pay for 2015, compared to
2014, and the average salary figures are substantially higher than the $51,939 median
income we saw reported by the U.S. Census bureau earlier for 2013.
Whats in for you?
Someone looking at these pay numbers might rightly ask; What does this all mean for
my pocketbook? Or better still, Whats in there for me? All these are good questions to
ask; and the answer to them is remarkably simple.
With the growth of data picking up exponentially, with demand for data professionals
being very high and the supply for qualified candidates being short, growth in pay
increasing positively year after year, it becomes very clear that aspiring or practicing
Data Ninjas all across the country have a favorable career outlook in terms of wage
and job prospects for years to come.
And whether your goal is to keep the paychecks coming at a steady rate or you are
just looking to advance your career, the field of data analytics is proving to be the place
where you can confidently make that happen.
It would be hard to talk about anything related to Data Analytics without
acknowledging the exponential growth in data we are experiencing. The world today is
awash with data, and this data touches nearly all aspects of our lives.
The far reaching tentacles of data today ranges from how we live our social life
online, how we communicate with loved ones or business partners, how we bank our
money or how we get health care.
Companies of all forms, shapes and sizes are experiencing the effects of this
astounding rates at which data volumes are growing. Projections of about 40% growth a
year into the next decade are not uncommon to see.

Data growth outlook. (Oracle)

The chart presented above is what most people usually reference when they talk
about the rapid rate of growth of data today. The projected data volumes by 2020 are
truly mind-blowing, especially when compared in perspective to where we are today.
To put this into perspective, if the data volumes of today were the size of 2 door car
sedans, the data volumes of the future, 2020 and beyond would be reaching the sizes of
Aircraft Carriers or more. The difference is truly astonishing.
This point of comparison can be further substantiated if we look at some dazzling
facts about the data growth rates currently being experienced.
Dazzling facts about the current growth of data
It is expected that by 2020 the amount of digital information in existence will have
grown from 3.2 zettabytes today to 40 zettabytes.
The total amount of data being captured and stored by industry doubles every 1.2
Every minute we send 204 million emails, generate 1.8 million Facebook likes,
send 278 thousand Tweets, and upload 200,000 photos to Facebook.
Google alone processes on average over 40 thousand search queries per second,
making it over 3.5 billion in a single day.
Around 100 hours of video are uploaded to YouTube every minute and it would
take you around 15 years to watch every video uploaded by users in one day.

570 new websites spring into existence every minute of every day.
Todays data centers occupy an area of land equal in size to almost 6,000 football
The NSA is thought to analyze 1.6% of all global internet traffic around 30
petabytes (30 million gigabytes) every day.
The number of Bits of information stored in the digital universe is thought to have
exceeded the number of stars in the physical universe in 2007.
Retailers could increase their profit margins by more than 60% through the full
exploitation of big data analytics. (Marr)
Some More Data Analytics Facts and Data Trends
Business analytics is a $12.2 billion industry, according to Gartner Inc.
McKinsey & Company report forecasts a shortage by 2018 of professionals
specializing in the field, people trained to distill data into meaningful information.
Every dollar that a company invests in business analytics earns $10.66, according
to Nucleus Research.
According to Forrester Research, 97 percent of companies with revenue of more
than $100 million are pursuing expertise in business analytics. (Kristal)
Numbers dont lie. These statistics on data growth rate is truly astounding, and with
more than 2 quintillion bytes of new data being generated every single day, we now see
companies, corporate managers and executives scrambling to hire individuals who
understand data and can work with it to derive competitive value. For the data ninja, all
this potentially translates into you commanding a high demand position in the job
market, long term job security and strong pay in salary.
According to a report published by McKinsey & Companys Business Technology
Office in 2011 entitled Big data: The next frontier for innovation, competition, and
productivity, data has swept into every industry and business function and is now an
important factor of production, alongside labor and capital.
This observation indicates significant need for Data Ninja-type talent in the coming
years and also emphasizes the great opportunities that this new era of abundant data
holds for companies, particularly in terms of being able to use data to gain efficiency
and tap into new business opportunities.
The United States alone faces a shortage of 140,000 to 190,000 people with deep
analytical skills as well as 1.5 million managers and analysts to analyze big data and

make decisions based on their findings.

Recently, research firm Gartner in one of their publications cited many industries and
companies as having a great need (demand) for more people skilled in managing and
analyzing data.
By 2015, big data demand will reach 4.4 million jobs globally, but only one-third of
those jobs will be filled. The demand for big data is growing, and enterprises will
need to reassess their competencies and skills to respond to this opportunity. Jobs
that are filled will result in real financial and competitive benefits for organizations.
An important aspect of the challenge in filling these jobs lies in the fact that
enterprises need people with new skills data management, analytics and business
expertise and nontraditional skills necessary for extracting the value of big data, as
well as artists and designers for data visualization.
Even though the statistics and articles presented by some research firms might
explicitly make references to Big Data (a topic that is a bit more advanced than the
intended scope of this book), we nonetheless see the move to the Big Data Analytics as
a natural career progression step for any person working in the data field.
As we can see from Gartners predictions above, the tremendous growth in new jobs
opportunities coupled with the potential shortages of suitable Data Ninja-type
individuals translates into healthier demand and salaries for those individuals who are
skilled and capable of crunching data.
Some of the most important responsibilities of Data Ninjas involve collecting,
sorting, and analyzing different sets of data to gain insights. These datasets being
analyzed can range from simple business metrics such as sales numbers to more exotic
datasets like user behavior and product performance.
From Data to Wisdom and everything in between
The ultimate goal of any analysis effort carried out by Data Ninjas is to transform
data into information. In its raw form, data is just what it is, data. And raw data is not
very useful unless it is synthesized and transformed into information that people and
organizations can actually consume and act on.
To get a good sense of what data analysis is about, we would leverage an existing
life cycle that articulates how data gets transformed to wisdom. This is the DIKW
pyramid (shown in figure below).

DIKW pyramid. (Longlivetheux)

The DIKW pyramid is especially relevant because it visualizes the knowledge
hierarchy, showing how information is defined in terms of data, knowledge in terms of
information, and wisdom in terms of knowledge.
It is important to also observe that the lineage in the DIKW pyramid starts with Data
at the foundational level, and Data Ninjas are vital in the lifecycle because they help
perform the necessary Data Analytics tasks that moves us up on the DIKW pyramidal
hierarchy. As a result, some might argue that without the right data analytics work being
performed, having wisdom might not be possible.
The Different States of Data and Information
Data and Information are one and the same thing. They just exist in different states. A
good way I have been able to explain this is to consider water. The fundamental
compound of water (H20) doesnt change, but it can exist in different states such as
liquid (water), gas (water vapor) and solid (ice).
Regardless of its state, water will still be water; but it would not be wise for a thirsty
person to quench their thirst by trying to drink water vapor or ice. They need to drink
water, and more importantly, they need to drink it in its liquid state.
This analogy has great parallels to the way companies go about quenching their
decision making thirst. In order to make insightful decisions, leaders in organizations
cannot go about it by consuming raw data that would simply be akin to a thirsty
person trying to drink ice or water vapor in order to quench their thirst.
Instead, decision makers need to consume information in order to make insightful
decisions. And just as some energy has to be put in process of transforming ice to liquid
water, some energy also has to be put into process of transforming data into information
that is safe for consumption and decision making.
This energy is what you, the data ninja of your company would be tasked to bring to
the table.


Data analysis is the primary function performed by Data Ninjas.
Data Analysis: Definition I
Data Analysis is the process of systematically applying statistical and/or logical
techniques to describe and illustrate, condense and recap, and evaluate data.
An alternate definition to the data analysis process is presented in a course description
by the Johns Hopkins University in Coursera.
Data Analysis: Definition II
Data analysis is the process of finding the right data to answer your question,
understanding the processes underlying the data, discovering the important patterns in
the data, and then communicating your results to have the biggest possible impact.
Typically, Data Ninjas perform five simple steps as part of their Analytics work:
1. Formulate the question
2. Collect the data
3. Analyze data
4. Communicate results
5. Reiterate
All of these steps are critical to the process and none can be ignored, skipped or
under looked.
Depending on the level of expertise, experience or specialization, the data ninja as
part of their job role can spend more time collecting and sorting the data than analyzing
or vice versa.
And the industry in which Data Ninja works may dictate or influence the specific
type of data that they would collect and analyze.
Ninja Tip 2.
The data analysis work performed by a Data Ninja is not for the faint of heart. It
requires a creative problem solver who finds it rewarding to identify, investigate,
isolate, and resolve data issues.
The Job responsibility of a Data Ninja can vary widely from company to company or

industries, or skill level, but generally, they include the following broad characteristics:
Writing queries to retrieve data from a database and other data sources.
Scrub data to remove duplicates and other errors within the data.
Analyze data to find insights or trends that can be used to improve their company's
Key performance metrics (KPI).
Prepare reports based on analysis and present to management.
Ninja Tip 3.
There are many paths an aspiring Data Ninja can take, but understanding the business
you work with before doing any data analysis is extremely important because businesses
often have different needs and approaches to working with data.
Its Your Business to Know Your Business
A Data Ninja who works with social data at a company like Facebook has a totally
different datasets and might employ totally different techniques to analyze data than, say,
an analyst working at a financial firm like Goldman Sachs, or an analyst at a health
insurance company like United Health Group.
So, it is important to understand the business you work with before doing any data
analysis especially given that businesses often have different needs and approaches to
working with data.
In general, the process of analyzing data can be divided into exploratory data analysis
(EDA), where new features in the data are discovered, and confirmatory data analysis
(CDA), where existing hypotheses are proven true or false.
Exploratory Data Analysis (EDA) - Example
A Data Ninja at a national retail chain, as part of their exploration, can analyze and
plot on a graph their product sales by region or zip code. Without any prior knowledge
of what to expect, the exploratory exercise can offer insights and the revelation that a
particular product sells more in the East Coast stores than it does in the West Coasts
stores, or that the sales of a particular product spikes during severe snowstorms, than
when the weather is average.
These are findings that can be profound and has potentials to influence the way the
company works, advertises or uses data. When done right, EDA can expose trends,
patterns, and relationships that are not readily apparent. The results from EDA analysis
can help the company to improve their marketing efforts or change the way they target
their customers with ads or open new lines of businesses that can in turn increase sales

revenue, reduce cost, and affect the bottom-line positively.

Numbers have an important story to tell. They rely on you to give them a clear and
convincing voice. Stephen Few
Confirmatory Data Analysis (CDA) - Example
A Data Ninja at an Ecommerce company can hypothesize that customers who buy
Product X have a 60% likelihood of buying Product Y if an ad impression about
Product Y is shown to them during the time of their checkout.
This is a solid hypothesis that can be verified using data. In this case, the Analyst can
set out to examine website traffic or navigation patterns to determine where their
hypothesis is true, i.e. whether based on the data, customers are more or less likely to
buy Product Y based upon prior exposure to impressions about the product.
Ninja Tip 4.
As a data ninja, you must enjoy working with data, have great attention to detail and
enjoy looking at data sets to find anomalies, outliers, and patterns.
On a day to day basis, a Data Ninja will be required to employ a number of skills to get
the job done - such as their technical skills, business acumen, presentation skills,
database skills, analysis skills, and sometimes coding abilities.
These skills allow Data Ninjas to perform their duties of analyzing data with
competence, as well as help them overcome any new challenges that come up along the
Below, we have broken down the skillset requirements of Data Ninjas into two
broad classifications.
1. Technical Skills
As a Data Ninja, your technical skills are absolutely essential to landing and keeping
your job. There is no getting around that. In the coming chapters of this book, we will
cover some of the technical skills (such as Excel, SQL, Data Warehousing, and
Programming) by presenting 5 Nuggets that are essential for professionals looking to
make a career working with data.
For the basics, some understanding of Excel and being able to work proficiently in it
will help (see Nugget on MS Excel). Also being able to understand data sources, data
structures, schemas, Data Warehousing, Structured Query Language (SQL), and some
programming concepts (if possible) would be extremely useful.
Some Math and general understanding of statistics and set theory would go a long
way to help. More advanced technical users can go on to master tools like Python,

Matlab, and a Statistical Language (R, SAS, and SPSS). These advanced concepts are
recommended, but generally not required for most entry level candidates and thus not
covered in this book.
2. Soft Skills
In performing analysis work, defining the problem and narrowing the analysis down
often requires a lot of soft skills. When analyzing data for a company or client, it is
important to be able to balance your time, reduce infinite what-if? scenarios and
understand the priority of needs that are at hand. Mastering all of these skills require
good self-awareness and control.
Unlike hard technical skills (mentioned earlier), which comprise a person's technical
abilities to perform certain functional tasks, soft skills are interpersonal and broadly
applicable across job titles and industries. You have to get well along with people you
work with, be dependable, be timely, be honest, be curious, and so on.
Interestingly, many soft skills are tied to an individuals' personality rather than any
formal training and are thus considered more difficult to develop than the technical
skills. But with continued practice and perseverance, many data ninja professionals
should be able to develop and advance the soft skills required to perform the job.
Technical skills may get you the job, but soft skills will help you keep the job.

Skillsets for the Data Ninja. (Nde)

From the diagram above, we see that analytical, critical thinking, and math skills are
absolutely essential to perform at a high level as a Data Ninja. Generally speaking, the
analytical and math skills might fall under the technical skills category, while
communication and critical thinking skills might fall under the soft skills category.
All of these skills intersecting together produces the ideal Data Ninja candidate i.e.
someone who is acutely analytical, thinks critically and communicates effectively.
Required Skillsets for the Data Ninja
Not each and every single skillset is absolutely required to be a successful Data
Ninja. Depending on the job, some of the skills may or may not be a requirement. But
having them can be advantageous to excelling and thriving in your career.
For example, you can have a successful career as a Data Ninja without knowing a lot
of math and statistics, or how to write a line of programming, but knowing math,
statistics, or programming will simplify things when it comes to solving very complex
or advanced problems.

Ninja Tip 5.
It doesnt really matter how much you know about the analytics process or how much
effort you have put into an analytics project.
If you cant communicate your results in a clear and timely manner to decision makers,
then you cant impact the business bottomline..
Tools used by Data Ninjas can vary widely depending on the level of expertise, the
specific jobs requirements, preferences in the company you work at and much more. So,
trying to provide an exhaustive list of every single tool out in the market as part of this
book would be next to impossible.
But below we have listed of some of the general tools and concepts that are highly
recommended as the starting point for individuals looking to get into the data analytics
Tools and Concepts for Data Ninja Candidates
MS Excel
SQL Server
Data Warehousing Concepts
Each of these tools and concepts listed will be covered in more depth in the ensuing
Ninja Tip 6.
As the saying goes, Anybody can buy a tool, but only a few special people can make
magic happen with the tools they have. As a Data Ninja, you should definitely stop
focusing on the tools at hand and instead focus on the magic you want to see happen.
Tools are Important, but Not the End-All-Be-All
By the time you go on to read the specific details in each chapter, my hope is to
continuously convey the all-important message that although software tools make
analysis easier, they are only as valuable as the information that you put in and analysis
that you conduct. As one of the popular sayings goes:
"Anybody can buy a tool. A few special people can make magic happen with it."
Data Professionals should always have the mindset of striving to make the best of
whatever tool they have. As a Data Ninja, no matter what tool you have at hand, I would

encourage you to take a moment to challenge yourself. Learn a few new tricks with the
tools you work with. Then, let the tools serve as a medium to enhance and complement
the logic and reasoning skills that you already have instead of being a distraction to
the process.
There is a lot of training available for anyone interested in becoming a Data Ninja.
The trainings vary in scale and rigor, ranging from informal training by personal study to
more formalized training by pursuing scholarly curricula at an accredited academic
The more advanced role you play in the job, the more advanced the training that may
be required. But, given that this book naturally dwells on entry level candidates, the
training required to start off might not be as rigorous or formal compared to what people
might think.
Training for Data Ninja Candidates
To start off, candidates are typically encouraged (but not required) to have an
undergraduate degree in a field such as accounting, statistics, mathematics, computer
science or business. Most of these requirements for formal degrees can be waived if the
candidate has sufficient years of experience, or demonstrates strong competence in
performing the job-specific roles offered to them.
Different employers might have different practices and hiring requirements. As a
result, some employers might require their Data Ninja candidates to have a masters or
doctoral degree in an area closely related to fields such as mathematics, accounting,
statistics, computer science or business. But in most cases, these advanced degree
requirements are usually only for candidates looking to get advanced analytics roles or
leadership roles, and usually do not apply for entry level or beginner candidates.
Training helps individuals gain a systematic approach to problem-solving. Some
intuition, artistry and guesswork may be needed when analyzing data, but for the most
part, the process is very scientific, with very systematic and repeatable ways to go
about analyzing a data set.
Understanding this systematic approach to working with data makes things more
standardized and saves data professionals the burden of having to reinvent the wheel on
concepts that have already been mastered. This is the main value proposition we see
offered by many training programs.
Training of Some Sort is Crucial for Success

Training sometimes gets a bad representation, because some people might perceive it
as being expensive or taking a lot of time. No matter the circumstances, that should not
stop you from pursuing a rigorous training program that will give you the essential skills
to perform competently at your job as a Data Ninja. Without competence, you will not
have a job. Training, and the benefits from it, is a point that cannot be overemphasized.
Practice Makes Perfect when it comes to Data Ninja Training
Through books, videos, lessons and a myriad of online resources, it is possible to teach
yourself much of what is needed to be an exceptional data analytics ninja.
No matter how you start or what route you take for training, it is possible to become
much better and even extra-ordinary in the data analytics game by making a commitment
to learning and embracing new techniques to help solve business problems.
In addition to reading and studying up on the concepts presented in this book, to get
much better at working with data, you have to actually do it; and do it as often as you
can. This is where practicing more will make you more perfect at what you do.
Ninja Tip 7.
As a Data Ninja candidate looking to land your ideal job, there are many benefits to
aggressively pursuing a training path and a regiment of continous learning whether by
formal training or not.
Continuous learning is about the constant expansion of skills and skill-sets through
learning and increasing knowledge. As life changes the need to adapt both
professionally and personally would be as important as the changes themselves.
One of my first real, non-restaurant jobs was as a Data Analyst for a really large
insurance/healthcare corporation. I worked in the area that managed the marketing
database for the company.
For example, we could only market to certain zip codes (by law) and once I had to
input something like 10,000 zip codes into a database in about three days.
We would also do a lot of analysis on what groups were more responsive to our
marketing campaigns. So Id end up throwing a ton of information into a spread
sheet (Lotus 123 at the time). Excel is very similar and then doing lots of analysis to
find the best performing groups (People between the ages of 55-65 who live in nonurban areas of Florida might be an example).
Using this info, wed go out and try to find more people (our target audience) who
mirrored the most responsive groups. Purchasing a mailing list or advertising in
certain publications whos readers are similar to these people (like AARP) would be
a way in which to target them.

You might be responsible for coming up with the stats for the target audience and
then going out and finding them. Youll also spend vast amounts of time in front of a
computer inputting and looking at data. You may also work with programmers (who
may be in India or somewhere else).
In order to do a good job, youll need to be very detail oriented. You will need to
like to work with numbers. Logical/critical thinking skills required. You will need to
be ok with sitting in front of a computer looking at data for long periods of time.
There may not be much room for artistic expression.
Leann C.
December 7, 2010 at 5:32 am
In this section, we present a sample Data Ninja Job description. This sample is
available free on the website and we have gone through the job
description to highlight and call out the specific skills and requirements that could be of
importance to an aspiring data ninja.
As mentioned earlier, different companies and industries may have specific job
requirements tailored to the specific Data Analytics role that they are looking to fill, but
in general, there are some broad skills and concepts that can be found in most rsums.
Note: The goal of presenting this sample is to serve for educational purposes ONLY,
and to highlight some of the skills employers look for when finding Data Ninjas in the
real world.

Data analy st job posting (workable)

MS Excel is basically a spreadsheet developed by Microsoft Corporation for
windows and other OS versions. The product allows for easy calculations, graphing,
tabulation and pivoting of data.
A poll from shows Excel on top of the list as one of the most popular
analytical tools being used in the industry today.
Compared to other products in its class, Excel stands as a very powerful tool and
certainly has its place in the market as far as data manipulation and analytics is
The ubiquity and versatility of the product gives users the ability to manipulate,
cleanse, and merge data sets with relative ease. As an analytics platform for small
datasets, Excel has proven to be very generous and can seriously reward anyone who
takes the time to learn and play with its formulas and calculations.
MS Excel offers several functionalities that are useful in analyzing and working with
data. Here are the main ones:
Sort: Can sort rows and columns of data in either ascending or descending order.
Filter: Can be applied to filter data results to appear in a certain way and fit a
certain criterion.
Conditional Formatting: It enables one to highlight a specific column or row or a
block with any color depending on the value of that block.
Charts: Can be used to graphically display data in the form of a chart or a graph
that depicts the particular rise or fall in the values, e.g. representing the profit rise.

Graphs represent data better compared to representing it in numerical values.

Pivot Tables: It is one of the most powerful tools that can extract the significance
from a detailed data set by allowing for quick summarizations to be done.
Tables: Properly arranged tables allow one to analyze data in a much faster way.
What-If Analysis: It allows one to try different values in cells in order to see
whether the changes will affect the outcome of the formula.
Analysis Tool Pak: It is an Excel add-in program that can perform financial,
statistical and engineering data analysis.
Data Ribbon
As a data ninja, you might find the data ribbon in Excel to be one of the most useful of
all. This ribbon holds functions that can help anyone quickly perform a variety of
statistical and non-statistical calculations on their data.
Data Analysis ToolPak
In addition to the out-of-the-box functions present in the data tab, Excel also makes
available the Data Analysis ToolPak1.
The Data Analysis ToolPak is especially relevant because it presents a powerful set
of tools used for statistical analysis and can help analysts figure out the variance,
correlation and covariance of data as well as other features.
If you need to develop complex statistical or engineering analyses, you can save steps
and time by using the Analysis Tool Pak. You provide the data and parameters for each
analysis, and the tool uses the appropriate statistical or engineering macro functions to
calculate and display the results in an output table. Some tools generate charts in
addition to output tables.
Excel allows for data to be imported from a variety of data sources for analysis. A
data ninja working with Excel can import data into their workbooks from a wide variety
of sources.
Example data sources for Analytics in Excel
From the web
From files: Excel, CSV, XML, Text or Folder that contains files with metadata and
From databases: SQL Server, Windows Azure SQL Database, Access, Oracle,
IBM DB2, MySQL, PostgreSQL and Teradata.

From other data sources: SharePoint List, OData feed, Windows Azure
Marketplace, Hadoop Distributed File System - HDFS, Windows Azure Blob
storage, Windows Azure Table storage, Active Directory and Facebook.
As a data ninja, you may be required to spend a lot of time summarizing data, because
people prefer looking at summaries.
Pivoting is an incredibly powerful tool that makes it easy to tabulate and summarize
data in exciting ways. Though there are many products in the market with pivoting
functionality, Excels ubiquity makes its pivot table one of the most widely used in the
The good thing about the pivot functionality in Excel (or any other pivot tool for that
matter) is that it allows analysts to quickly change how data is summarized with very
little to no effort.

Microsoft Excel screenshot. (Excel)

Whether you want to summarize daily sales data for your company by line of business
(LOB), or sum employees total working hours for the week, pivot tables will let you do
that with relative ease.


Another extremely important feature within excel is that of presentation. The results
of any data analysis usually need to be presented to users and decision makers for
consumption. Microsoft Excel excels at this.

The sky is literally the limit for using Microsoft Excel for dash boarding and
presentation of data that tells a consistent and coherent story. (Excel)
Some of the common presentation and visualization functions within Excel include:
Pie Charts
KPIs (Key Performance Indicators)
Drill Up and Drill Down
Background Color and Background Images
In recent years, Microsoft has been putting in efforts toward a number of integrated
components for data collection, analysis and visualization. These products are currently

being distributed under the solution named Power BI. Some of the products within the
Power BI ecosystem include:
Power Pivot. Power Pivot provides end-user accessible, in-memory data modeling
for large data sets. Power Pivot was introduced as an add-in to Excel 2010, and
has since been fully integrated as an out-of-the-box feature in Excel 2013.
Power View. Power View is a complimentary technology to Power Pivot, enabling
advanced visualizations for data models created in Power Pivot. Power View
delivers interactive visualizations, including animated visuals and maps powered
by Bing Maps. Originally Power View was available only as a SharePoint feature,
but has since been integrated directly into Excel 2013.
Power Map. Power Map, previously known by the development name GeoFlow, is
an add-in to Excel 2013 that provides more compelling Bing Map powered
visualizations, extending Power Views capabilities with 3D map visualizations.
Power Query. Power Query, previously known by the development name Data
Explorer, is an add-in to Excel 2013 that provides a more fluid, open data
discovery environment than is provided by Power Pivot alone.
The Future of Power BI
By leveraging the popularity and versatility of Excel, Microsoft has worked on
providing users with new capabilities for analyzing and working with data. It is truly
exciting, to say the least, and will potentially redefine the way data analysis is done
within organizations.
PowerBI seems to hold a lot of promise and because of this Microsoft is laying a
great stake in it.
Microsoft is not content to let Excel define the companys reputation among the
worlds data analysts. Thats the message the company sent on Tuesday when it
announced that its PowerBI product is now free. According to a company executive, the
move could expand Microsofts reach in the business intelligence space by 10 times.
If youre familiar with PowerBI, you might understand why Microsoft is pitching this
as such a big deal. Its a self-service data analysis tool thats based on natural language
queries and advanced visualization options. It already offers live connections to a
handful of popular cloud services, such as, Marketo and GitHub. It is
delivered as a cloud service, although theres a downloadable tool that lets users work
with data on their laptops and publish the reports to a cloud dashboard.

Excel is a very versatile tool that plays a pivotal role in the data analysis process for
most companies. But Excel doesnt come without its drawbacks. As such, it becomes
very important to have an understanding of what Excel is and is not.
Lets face it: We all have seen a crazy Microsoft Excel spreadsheet or encountered
one of its dreaded Not Responding messages. Unfortunately, the flexibility and ease
of Excel makes it the ideal candidate for inappropriate use and widespread abuse.
Modern Excel 2013 and the latest Power BI add-ins do sizzle in demonstrations, but
there are analyses that simply do not make sense to use Excel for today.
Collaboration: Excel is inherently designed for personal use and for single-user access
at a time. Spreadsheets tend to be shared via email, which causes duplicate copies or
inconsistent data.
Maintenance: With data coming from different sources, it can become very difficult
to maintain spreadsheets manually, especially if the data changes frequently.
Data Integrity: With multiple users having the capability to make copies of any
spreadsheet with ease, it becomes very difficult to control who or where the single
version of truth lies potentially causing serious data integrity issues.
Accessibility: Being a desktop-based application, anytime or anywhere access is
not possible except when you have a mobile gadget such as a laptop.
Data Security: Excel data is usually downloaded and made locally available to
user machines. This makes the spreadsheets not only error prone but also
susceptible to theft and data loss.
Scalability: There are times when spreadsheets get so big that it doesnt make any
sense to hold the data in Excel. The new PowerPivot side of Excel compresses and
handles more data than traditional Excel alone. However, capacity will still be
constrained to the users CPU power since all the processing is done by your local
Ninja Tip 8.
Choosing between Excel and some other solutions for your data analytics purposes is
not an either/or proposition, nor is it a zero sum game.
Excel has capabilities that are proving to be very valuable for certain types of data
analysis; and the point of this book is to make that clear and encourage you as a Data
Ninja to go out, explore the tool further and use it for those use cases that works well
for you.

In this nugget, I have presented a good suite of solutions that come with Excel which
can help companies and their analysts do wonders with data. I have also tried to present
some of the drawbacks that can come with building enterprise-wide solutions on Excel.
But I also realize there are those who would rather focus all the attention on the
negatives rather than the positives of Excel or any other product for that matter.
Excel is Extremely Good, but has its Limitations
As mentioned earlier, the point of this book is not to get into a debate over whether
Excel is good or bad as an analytical tool, but instead to appreciate that Excel has
potentials (whether you like it or not) which may or may not be useful for your needs.
When it comes to data analytics, Microsoft Excel should not be seen as the panacea,
because it simply isnt and no single tool is for that matter.
Excel offers a lot of functionalities to help companies and data ninja professionals in
their analytical journey. But, Excel doesnt (and shouldnt be expected) to solve all
problems faced by companies today.
Nonetheless, it has a vital and pivotal role to play within the data analytics ecosystem,
and must not be ignored.
GCF Learn Free
GCF Learn Free will teach you how to create formulas. They will guide you through the
basics of creating formulas for any kind of spreadsheet. They also give you
opportunities to practice with real-world scenarios.
Chandoo has a straight forward approach to learning excel. They cover the basics and
will move you along into advanced knowledge. They go over everything from formulas,
charts, VBA, dashboards, and more.
Five Minute Lessons
Just like the name suggest, this site offers 5 minute lessons to bring you to the next level
in excel. These are short lessons but teach important skills that everyone should know in
Excel Exposure

Excel totes the Microsoft Most Valuable Professional badge on their landing page. They
have tons of resources and video tutorials that can take you from beginner to advanced
user. Their lessons include videos, infographs, and workbooks.
Trump Excel
An effort to learn and share amazing tricks on excel spreadsheets. Trump Excel
focuses more on tips and tricks rather than walking through the basics. Trump Excel
assumes the user already has a basic knowledge but wants to learn some new things.
Even if you are a beginner you will pick up some interesting tricks here that you might
not find elsewhere or on your own.
Excel Tip of The Month
Isaac Gottlieb created this site to give monthly tips. This is another site that focuses on
tips and tricks rather than guiding you through from start to finish. This is a straight
forward site with great tips. Once you get started in Excel this site will continue to give
you tips.
Excel Tips
You guessed it, more tips and tricks. Excel is a program that is so immersive, making it
difficult to master every single functionality it has. That isnt meant to discourage you
but rather encourage you to learn everything you can.
Peltier is a blog that focuses on excel charts. They have pages on pages of excel charts
and how-to guides. If you are struggling with charts in excel this is the place for you.
Excel Central
Excel Central offers videos, eBooks, and file downloads. They sell courses and books
but they will let you view the first 8 chapters in their courses for free to see if it is right
for you. Not a very fancy website but they do have some good essential information
How Cast
How cast provides short-form instructional video and text content. They do not
specifically concentrate on excel but their videos are a great way to get started.

PC World: Use Microsoft Excel for Everything

This is an excellent article about excel that enlighten you on the many uses of excel.
Microsoft has put together an excellent resource for learning Excel. You know you can
trust that they know what they are talking about since they created the program.
There is just no escaping Lyndas vast categories of tutorials. Lynda is a paid service
that offers a huge list of different kinds of tutorials and videos, a great resource for
anyone who wishes to expand their knowledge on pretty much any subject.
Learn to extend Excel using add-ins and scale using SharePoint and Office 365.
PowerBI including PowerPivot, PowerQuery, PowerViews, and PowerMaps
extends the capabilities of native Excel tremendously. So, dont ignore it.
Excel is here and is promising to be around for a while, so learn it. Master the
formulas and techniques for acquiring, analyzing and presenting data from different
data sources.

As companies embrace data for better and faster decision making, the database
environments they use have become increasingly complex. The need for mastery of a
computer language aimed at accessing, manipulating, and querying data stored in
relational databases has become extremely important. This is where SQL comes in.
Structured Query language (SQL) pronounced as sequel or ess-queue-ell is the
primary language used to request information from a database and it is everywhere.

For example, a database-driven dynamic web page takes user input from forms and
clicks and uses it to compose a SQL query that retrieves information from the database
required to generate the next web page.
Even more astounding is the fact that all Android Phones and iPhones have easy
access to a SQL database called SQLite and many applications on your phone use it
Today, many of the applications that run our banks, hospitals, universities,
governments, small businesses, and just about every computer eventually touches
something running SQL.
This ubiquity makes SQL an incredibly powerful tool and it has proven itself over the
years to be a very successful and solid data analytics technology worth mastering.
SQL is especially beneficial because it provides some standardization to the way
data in databases can be queried. It is tremendously flexible, powerful, and very
accessible, which makes it simple to master. Some of the key benefits of using SQL to
store, manage and analyze data over other approaches are listed below:

You can query and make updates to data in a databases.

You can look up data from a database relatively rapidly.
You can relate data from two different tables together using JOINs.
You can create meaningful reports from data in a database.
Your data has a built-in structure to it.
Information of a given type is always stored only once.
SQL Databases can handle very large data sets (compared to excel spreadsheets).
SQL Databases are concurrent i.e. multiple users can use them at the same time
without corrupting the data.
9. SQL Databases scale well (well beyond the data volumes of what can be handled
in simple excel spreadsheets).
With SQL, you can build databases, enter data into the database, manipulate data, and
query the database data with relative ease.
Ninja Tip 9.
There are many database products such as Microsoft SQL Server, Oracle, Netezza,
Teradata, MySQL, etc. which support SQL.
At the core of SQL is the Relational Database Management System (RDBMS).

DBMS Defined
A relational database management system (RDBMS) is a program that lets you create,
update, and administer a relational database.
In an RDBMS, data is structured in database tables, fields, and records. Tables within
the RDBMS might be related by common fields for easy cross-table querying.
RDBMS also provides relational operations (in the form of SQL) to manipulate and/or
store data into the database tables.
RDBMS is the basis for SQL in all modern database systems like MS SQL Server, IBM
DB2, Oracle and MySQL
Data is stored in a set of tables. Each RDBMS table consists of database table
rows. Each database table row consists of one or more database table fields.
Rows represent records and columns represent record attributes or fields.
Each row in the table is usually identified by a primary key that uniquely identifies
the record to other systems.
RDBMS use several design patterns to Reduce Duplication of data in database
Every row in the same table has exactly the same number of columns (even though
some of the column values might be NULL)
RDBMS has Data Manipulation Language (SQL) that is used for querying and
manipulating data in the RBDMS.
ACID is utilized to keep transactions reliable. The acronym refers to the four key
properties of a transaction: Atomicity, Consistency, Isolation, and Durability.

Atomicity: All changes to data are performed as if they are a single operation.
Consistency: Data is in the same state when a transaction starts and when it ends.
Isolation: The intermediate state of a transaction is invisible to other transactions.
Durability: After a transaction successfully completes, changes to data persist and
are not undone, even in the event of a system failure.
There are many vendors supplying RDBMS products in the market some of which
are proprietary and some which are open-sourced. A few examples of these RDBMS
systems include: PostgreSQL, SQLite, MySQL, MSSQL Server, Oracle, Teradata,
Netezza, and Sybase.


Although (in theory) SQL is standardized, in practice it is not. There are many
vendors in the market and each of them has their own variation and flavor of the
language. In general, SQL written for one RDBMS system, such as Sybase, may not
work for another RDBMS system, such as MySQL or PostgreSQL, because the syntax is
SQL Portability
SQL database platforms tend to implement the SQL standard in different ways. For
example, the SQL date and time data types are sometimes omitted in favor of
proprietary solutions.
PostgreSQL notoriously contains a number of custom data types; for instance, it
provides an entire range of data types that define geometric objects (e.g. box and line).
These geometric object types are not necessarily available in other database systems, so
the database developer who uses those types may be locked in to PostgreSQL. This
situation would arise if converting the geometric object types to another type usable by
a different database would consume too much time or be altogether impossible.
An argument typically made against complaints about SQLs lack of portability is
that the SQL standard, despite being long and complex, is not completely defined and, in
some cases, is ambiguous.
Today, SQL is the premier language used for querying and working with relational
data. This is accomplished by writing SQL query statements.
SQL statements are divided into two main categories: DML (Data Modification
Language) and DDL (Data Definition Language). Below, we provide a high-level
overview of these two categories and how to leverage them for your data analytics

SQL commands. (TechnologyCrowds)

Data Manipulation Language (DML)
DMLs manipulate data. As such, they are usually used for inserting data into database
tables, retrieving existing data, deleting data from existing tables and modifying existing
data. DML never modifies the schema of the database (table features, relationships,
Example SQL DML:
UPDATE and INSERT statements
The processes performed by DML statements are what a data ninja may be tasked
with performing on a day-to-day basis. That is why it is recommended that readers are

proficient or at least familiar with the concepts of writing SQL DML statements.
Data Definition Language (DDL)
DDL statements are used to build and modify the structure of objects in a database.
These database objects include views, schemas, tables, indexes, etc. Some examples of
DDL statements:
Example SQL DDL:
CREATE - create objects in the database.
ALTER - alter the structure of the database.
DROP - delete objects from the database.
TRUNCATE - remove all records from a table.
RENAME - rename an object.
Data Control Language (DCL) and Others
In addition to DMLs and DDLs, there are more advanced topics in SQL used for
interacting with the RDBMS. DCL (Data Control Language) and TCL (Transaction
Control Language) are used to manage transaction integrity, security around the data and
Data Control Language (DCL) is used to create roles, permissions, and referential
integrity as well it is used to control access to database by securing it.
Example SQL DCL:
GRANT - give users access privileges to database.
REVOKE - withdraw access privileges given with the GRANT command.
EXECUTE AS used for impersonation, to run as a particular user.
Transaction Control Language (TCL)
Transaction Control Language (TCL) statements are used to manage the changes made
by DML statements. They allow statements to be grouped together into logical
Example SQL TCL
COMMIT - save work done.
SAVEPOINT - identify a point in a transaction to which you can later roll back.
ROLLBACK - restore database to original since the last COMMIT.
SET TRANSACTION - Change transaction options such as isolation level and

which rollback segment to use.

Commercial RDBMS systems such as Microsoft's SQL Server, Oracle DB, MySQL
and IBM's DB2 are complex applications that call for specialized knowledge and
training. As a result, some organizations hire dedicated database administrators (DBA)
to manage and administrate their RDBMS environments.
This role of DBA, which is usually within the Information Technology department, is
charged with the creation, maintenance, backups, querying, tuning, user rights
assignment and security of an organization's databases.
Installation, configuration and upgrading of Microsoft SQL Server/MySQL/Oracle
server software and related products.
Establish and maintain sound backup and recovery policies and procedures.
Take care of the Database design and implementation.
Implement and maintain database security (create and maintain users and roles,
assign privileges).
Database tuning and performance monitoring.
Application tuning and performance monitoring.
Setup and maintain documentation and standards.
Plan growth and changes (capacity planning).
Do general technical trouble shooting and give consultation to development teams.
Setup and maintain documentation and standards
Administrative DBA Work on maintaining the server and keeping it running.
Concerned with backups, security, patches, replication, etc. They are concerned
with things that concern the actual server software.
Development DBA works on building queries, stored procedures, etc. that meet
business needs. This is the equivalent of the programmer. (Many data ninjas
analysts would fall into this category)
Architect DBA Design schemas. Build tables, foreign keys, primary keys, etc.
They work to build a structure that meets the business needs in general. The
designs they produce is used by developers and development DBAs to implement
Ninja Tip 10.

As the data ninja of your organization, you might not be explicitly responsible for
playing the role of a DBA, but it still might be worthwhile for you to have some
rudimentary understanding of the concepts of the DBA in general. This will make you
more versatile, and hence more marketable in the industry.
There are a number of key benefits as to why it is important for data ninjas to master
RDBMS systems and to also be proficient in SQL. The top 3 of these includes:
1. Ubiquitous: SQL is a ubiquitous standard for accessing data within databases and
many of the current programming language out there have a way to access SQL
2. Easy to learn: SQL is widely popular and is widely accepted and utilized within
the industry. It is easy to find experts who know the subject and have years of
experience on using it. Its also relatively easy for new comers to pick up the
syntax without requiring much training.
3. Tried and Tested: Finally, SQL is very closely tied to the relational model, which
has been thoroughly explored in regards to optimization and scalability. Even
though SQL solutions still requires manual tweaking (index creation, query
structure, etc.), the platform has been around for a while now and is well tested in
production environments.
A lot of the data in existence today is stored in RDBMS databases and SQL is the
premier interface used to access and manipulate this data.
Your smartphone stores its contact database in a relational database. Your online
banking information and all your financial history, statements, personal data and so
forth, are all stored in a relational database of some sort.
SQL is the primary language used to access and analyze this information. So, as a
data ninja who aspires to work intimately with data, it is paramount to master SQL.
Now that you have read this nugget on SQL and have got your feet wet, I would
recommend that you continue to read and practice your skills. The more you read the
more you are going to learn, and pretty soon you will be fluent in writing SQL
statements to interrogate data.
I have put together the best learning resources that I have come across to help you on

your journey. Below you will find links to paid online courses that have spent years
developing their videos and courses to really immerse you into the material. If you are
not ready to shell out some cash, there are also free courses that have excellent
resources available.
First we will look at some of the premium resources that are available to purchase.
These resources typically have the most to offer and will carry you further than some of
the free resources. is a very popular online education company. They offer thousands of
different courses for creative software and business skills. Inside the MYSQL course
they have several different skill levels from beginner to advance depending on your
skill level. They offer a 10 day trial period to get started; a perfect way to see if their
services are right for you. At the end of the trial you can choose between different levels
of payment, from $25 a month on a month-to-month basis to $375 annually.
Link :
Infinite Skills
Infinite Skills was recently purchased by OReilly media. They offer 142 training
videos on MYSQL. They also have downloadable practical files that help further your
skills beyond the videos. I find their website to be a little confusing and counterintuitive
but they do offer good videos that will help you. They offer a $25 month-to-month fee
that includes a mobile app.
Learn Now Online
Learn Now Online has a wide range of topics from programming and mobile
development to SQL. They have a nice set of online videos and options to choose from.
In terms of cost, they are a little more affordable with options starting at $49 annually.
Paid premium courses are not for everyone, maybe you want to dive in a little deeper
before you decide to pay to further your skills. Below are some great free online
Udemy is an online marketplace where experts can create their own courses which can
then be offered to the public for free. Each course has a different author and has user
reviews so you can decide which course will be best for you.
Learn Code the Hard Way
Learn Code the Hard Way offers books on various subjects. They are currently working

on an SQL book, but have posted the book online for free while they work on it. You
can view the book by chapters. Dont let the name fool you, it is very approachable with
easy to understand topics. I am assuming once the book is completed it will be available
to purchase from their website.
SQL Server Central is a resource in the Microsoft SQL severs community. It has many
DBAs, developers and users, plenty of useful and valuable information here. This is one
to keep booked mark as you continue your career in SQL.
SQL Fiddle
SQL Fiddle allows you to select a database, build a schema, populate the schema, and
run queries against it. SQL Fiddle is a great resource for practicing different syntax of
SQL and testing your queries.
Database Journal
Data base Journal is a script library, they offer a huge data base in an assay of subjects.
They have articles, news, and tutorials all offered for free. Feel free to post questions
and comments on their forum. They update their databases frequently and have topics
that date back to 2010 up to 2015.
SQL-Tutorial offers problems and solutions. They have resources for novice users and
those who feel they already have a grasp on SQL but want to learn more. They will help
you to program queries. The information is presented as an Ebook you can read through.
1KeyData SQL
1KeyData SQL is a very nice resource to help you with SQL. They have common SQL
commands, functions, constraints, and tables available to access whenever you may
need them. They also offer some video tutorials and quizzes to help you along.
The Schemaverse
The Schemaverse is a space-based strategy game implemented entirely within a
PostgreSQL database. Play against other players using raw SQL commands to command
your fleet. This is a fun way to keep your skills sharp.

SQL Zoo is a step-by-step tutorial with live interpreters, allowing access to tables using
any of Oracle, SQL server, MYSQL, and PostgreSQL engines. Once you feel ready they
also have online quizzes to help assess your skills.
Tutorials Point
Tutorials Point has tons of free online tutorials and reference manuals. They also offer
premium services at a fee, if you decide to pay you will get premium support and
instructor help. If you do not wish to pay, their free resources are excellent and will
help you from the installation of MYSQL all the way through to importing databases.
SQLCourse is an interactive online SQL training resource that offers free training. They
will get you started with the basics and move you along to more advanced topics. The
site is funded by advertisements so you will have to scroll past some ads while you are
reading but they offer great material all about SQL.
W3Schools offers free material to view but they also offer premium services such as
certificates. In order to receive a certificate you must pay the premium price of $95 and
pass an online test. It has a built in interpreter in the browser so you can try different
queries and see the outcomes.
Sol Tutorials
Sol Tutorials GalaXQL is an interactive SQL tutorial. This is another fun tutorial, take
the journey into outer space while writing SQL code. The site was created by Kari
Komppa and is totally nonprofit and is very limited with ads, an enjoyable resource all
These are all great online resources you can use to help yourself along on your journey
to mastering SQL, but sometimes you need to give your eyes a rest from the screen and
turn to a physical book.
Holding a book and turning the pages has always been my favorite way to learn. There
is something about highlighting and underlining key points in the book that just seems to
help me remember.
I have collected some of my favorite titles and created a small list below. These titles

can be found anywhere that sells computer reference titles.

SQL in 10 minutes
Author: Ben Forta
Published by: Sams Publishing
SBN-13: 978-0672336072
Learning SQL
Author: Alan Beaulieu
Published by: OReilly Media
ISBN-13: 978-059652083
SQL Cookbook
Author: Anthony Molinaro
Published by: OReilly Media
ISBN-13: 978-0596009762
SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL (3rd
Author: John Viescas
Published by: Addison-Wesley Professional
ISBN-13: 978-0321992475
Head First SQL
Author: Lynn Beighley
Published by: OReilly Media
ISBN-13: 978-0596526849
It is important to keep learning and using what youve learned. You will only get
more proficient as time goes on, so keep fine tuning your skills and never give up.
Gain familiarity and proficiency in using SQL to interrogate data in databases.
Understand Relational database Management Systems (RDBMS), the players in the
market and how they work to store and manage data at the fundamental level.
Go beyond SQL into the NoSQL (Not Only SQL) world. Gain familiarity or at
least some elementary understanding of NoSQL databases and also understand why
the trend is moving in that direction.


In the first Nugget, I addressed the importance for a data ninja to leverage Excel for
their analytical needs. The second Nugget was about leveraging RDBMS systems to go
beyond Excel for analysis. In this Nugget, we will explore the intricacies of working
huge datasets and the need to adequately store them in data warehouses.
Numbers have an important story to tell. They rely on you to give them a clear and
convincing voice.
-Stephen Few
As mentioned in the nugget for MS Excel, using Excel for big data projects can pose
some serious challenges, especially relating to privacy, data redundancy and
concurrency issues that arise when users retain their own personal copies of sensitive
corporate data on the personal computers and laptops.
Because of these challenges with using Excel spreadsheets alone, companies often find
themselves needing other more robust, enterprise-scale solutions to help out.
The solution that often comes to the rescue when companies are challenged with the
need to move and store huge volumes of data falls broadly into the category of Data

Warehousing (DW).
Many organizations today have a data warehouse of one form or another. A data
warehouse serves many purposes within organizations, but at a basic level a data
warehouse is defined as a massive database typically housed on a cluster of servers, or
a mini or mainframe computer serving as a centralized repository of all data generated
by all departments and units of a large organization.
Brief History of Data Warehousing
The term was coined by the W. H. Inmon, a well prominent figure in the field of data
The DW consolidates data from a variety of sources in one centralized location and
is typically designed to support Business Intelligence processes, along with strategic
and tactical decision making.
Data Warehousing Defined
Data warehousing allows a company or organization to create a consolidated view of
its enterprise data, optimized for reporting and analysis. Basically, a data warehouse is
an aggregated, sometimes summarized copy of transaction and non-transaction data
specifically structured for dynamic queries and fast, efficient business analytics.
With all the companies data available in one location, i.e. the Data Warehouse,
companies can provide data consumers with a coherent picture of the business at a point
in time.
In data warehousing, data and information are extracted from heterogeneous
production data sources as they are generated, or in periodic stages and loaded to the
Data Warehouse. This approach makes it simpler and more efficient to run queries over
data that originally came from different sources.
The diagram below captures the complete architecture of an end-to-end data solution
within a company and it shows the pivotal role played by the Data Warehouse.

Data Warehouse ETL Architecture. (serra)

From the image illustration above, we see data coming in from various data sources,
including CRM, ERP, and any other data sources the company may have.
These incoming data sources get cleansed in staging areas, and eventually gets stored
in the DW. From the DW, end user applications like Excel, Microsoft Stack (SQL
Server, SSIS, SSRS, and SSAS), Oracle OBIE, Scribe, Informatica, DataStage,
Tableaux, MicroStrategy, QlikView, etc. can all access and consume the data directly
or push down other streams for consumption.
The Data Warehouse proves to be especially relevant because data and information
are extracted from heterogeneous production data sources as they are generated, or in
periodic stages, making it simpler and more efficient to run queries over data that
originally came from different sources.
The Data Warehouse therefore becomes an ideal go-to source for Data Ninjas
looking to do analysis work because they are more than likely going to find a vast
majority of the data needed for analysis in one single source rather than having to
connect to many disparate sources.
Point to Note
The task of building a data warehouse can take several years and usually involves a
team of highly specialized data professionals whose tasks is to build both the data
warehouse model and the integration points to get data from source systems around the
company such as CRM, ERP etc.
For the purposes of book, we see Data Ninjas as people who are not involved with the
actual building of the Data Warehouse, but depend on the data that is present in the DW

for their analytical needs.

As a data ninja tasked with analyzing the companys data, the job requirements would
typically involve handling and working with data coming from or going into a data
In this regard, we see that the data warehouse offers a number of key benefits to the
analytics process as a whole. Below, we provide a list of a few of these benefits.
Benefits of the Data Warehouse
Standardizes data across an organization.
Consolidation of Data from Multiple Sources.
Timely Access to Data.
Available of Historical Data for Analysis.
Having one version of the truth, so each department will produce results that are in
line with all the other departments, providing consistency.
Enhanced Data Quality and Consistency.
Reduction in manual reconciliations.
Dimensional modelling is a crucial part of the Data Warehouse process. Given that
most data warehouses today follow the dimensional model pattern, an understanding of
the concept of dimensional modelling is therefore extremely important when performing
In dimensional modelling, all data is contained in two types of tables called Fact
Table and Dimension Table. The Fact table contains the measurements, metrics or facts
of business processes, while the Dimensional Tables contain the context of the
Dimensional modeling is different from the normalized modeling (which is more
focused on reducing and eliminating data redundancy) to enable analysis and querying
through massive and unpredicted queries. The processing of massive and unpredicted
queries are some of the things which is a relational model is ill-equipped to handle.
Dimensional Model Pros ad Cons
Data Retrieval performance.
Good for analysis- slice and dice, roll up drill down.

Easy for maintenance and interpretation by the administrators.

Data loading time is increased.
Reduces or diminishes flexibility in case of business change or dimension change.
Storage increases due to denormalization. Same information might multiply
considerably during storage.
Example of Dimensional Modeling
Dimensional Model (Data-Warehouses)
The example above shows an example of a dimensional model of company sales
information. In the model presented, Units sold is a Fact and Location, Date, and
Product are Dimensions.
An analyst working with the model can be able to analyze the sales (fact) across the
different dimensions of Product, Channel, Customer, Order, Store and Time.
Why is Dimensional Modeling Beneficial to the Data Ninja?
Ease of Use
Dimensional modeling is very beneficial to the Data Ninja because it is a wellestablished DW design approach and is understandable by the business because
information is grouped into coherent business categories or dimensions that make sense
to business people. Usually, dimensional models are completely based on business
terms, so the business knows what each fact, dimension, or attribute means.
Query Performance
In addition to their ease of use, Query Performance is the second reason dimensional
modeling is of great use to Data Ninjas. Denormalized dimension hierarchies have a
significant impact on query performance and can easily be optimized for better Query
Performance. Dimensional models are also very extensible, allowing for new attributes
to easily be added to the Dimensional Tables without affecting facts in the Fact Table.
Data Warehouse: The queryable source of data in the enterprise.
Data Mart: A logical subset of the complete data warehouse.
Operational Data Store (ODS): The point of data integration for operational
Metadata Database: All of the information in the data warehouse environment that

is not the actual data itself. It is centrally maintained and stored.

Entity Relationship Models (ER): Entity-relationship modeling is a logical design
technique that shows the relationship between data.
Dimensional Models: Dimensional modeling is the name of a logical design
technique used for data warehouses. Every dimensional model is composed of a
fact table and a set of dimension tables.
Facts: A fact is an event that happened or that has been measured, usually captured
as a number, e.g. a single sale of a product to a consumer or the total amount of
sale in a specific month is a fact.
Dimensions: A dimension relates to facts and contains attributes that can be used to
add qualitative information to the numeric information contained in facts. E.g. A
dimension can be a list of products or customers, or time space that can be used to
analyze the fact.
OLAP: (Online Analytical Processing) Online analysis of transactional data.
OLAP tools enable users to analyze different dimensions of multidimensional data.
OLTP: Online Transaction Processing. This is a class of information systems that
facilitate and manage transaction-oriented applications, typically for data entry and
retrieval transaction processing.
ETL: This is the short acronym for the Extract, Transform, and Load process. ETL
processes retrieves data from operational systems and pre-processes it for further
analysis by reporting and analytics tools. It's also the ETL processes that is
responsible for feeding a data warehouse with data.
Ninja Tip 11.
As a data ninja, the details and intricacies of data warehousing might not be directly
required in performing your day to day job which is analyzing data.
But having some familiarity with these concepts, the tools and their functions is highly
Such understanding will not only put you ahead of your peers in terms of marketability,
but will also serve to give you the full picture of data through its life cycle from
creation layer, etl layer, storage layer and eventually the consumption - which is what
might be of most importance to you.
Like many other concepts discussed in this book, the Data Warehouse is at the
forefront of companies data ecosystem and they influence the way, Businesses acquire,
store, analyze data and consume data for business planning and decision making.
As Data Ninjas who will be tasked with working with the companies data, you will
undoubtedly encounter the data warehouse as part of the process of working with data.

You might use it either as a source or destination of datasets used for analysis
Hence, an understanding of data warehouse system architecture will be important in
your responsibilities of being an effective data analyzer.
Data Warehouses can be a bit daunting, but after reading this chapter, I hope you feel
a little more at ease. Since Data Warehouses are crucial to most enterprises it is
important to fully understand how they work and how you can harness their full
potential. Below are some resources to help you further your knowledge and
understanding of the concepts.
The Data Warehousing Information Center
The Data Warehousing Center is a vast collection of essays and articles about
everything data warehousing. They geared the site towards someone who is just now
getting started with data warehouses.
Tutorials Point
Ive listed Tutorials point elsewhere because they offer so many great resources. Again
they offer some great tools for learning more about data warehouses. Before you start
with them you should have a basic understanding of database concepts such as schema,
ER models, and structured query language.
Learning Data Modeling
Learning Data Modeling focuses on the concepts of data warehouses. It is geared
towards the novice user and offers guides and picture graphs to help you understand
their concepts. The site is relatively small but has some good articles.
Another site that you will see throughout my book, 1KeyData has an impressive
collection of tutorials and resources. Not focused directly to beginners but rather tries to
bring those with a basic level of understanding to a higher-level of understanding. You
will learn about the tools needed to implement a data warehouse, the steps needed to
fulfill your needs, and the concepts that cover data warehouses.
Why Learning Data Warehousing Still Matters
This is a great article about why you should learn about data warehouses. It does not

offer guides or tutorials but will keep you motivated If you start to lose faith in the
importance of learning about data warehouses.
Data Warehousing: Academic Tutorials
Academic Tutorials totes the slogan of Quick and Easy Learning. They want you to
get in and get out as quick as possible while packing in the most information possible.
Their tutorials go over pretty much everything you need to know about data warehouses.
The site is not very user friendly and a bit scattered but if you can navigate through the
site you will be heavenly rewarded.
The great Lynda makes another appearnce. is a great resource for learning
anything you need to know. This is a premium site that charges a fee for their courses
but it is worth the money. Their support is supurb and their courses are great. There is
nothing here that you couldnt find for free with a bit of searching but if you want a one
stop shop site then is the place for you.
Data Modeling 101
Data Modeling 101 has some great picture graphs that easily layout the information you
need. The resources provided are limited but if you read through their pages you will
pick up some good lessons and well thought out information.
A lot of people shy away from Wikipedia, but when it comes to tech related information
it is usually a good place to start. They can lay out what it is in plain English without
relying on techno jargon. They also have some good links that will help you find good
information. It is definitely worth checking out.
Along with the web based resources, I highly recommend Ralph Kimballs book The
data Warehouse Toolkit. It is a leading authoritative guide on data warehousing. In its
second edition The data Warehouse Toolkit has developed into the most comprehensive
collection on dimensional modeling for data warehousing. A must read if you want to
fully understand data warehouses.
The Data Warehouse Toolkit

Author: Ralph Kimball

Published by: Wiley
ISBN-13: 978-0471200246
In all, I know data warehousing can seem obscure but it is important and the more you
know the more important you will be. Data warehouses are crucial to most enterprises
and mastering how they work and how to best use them will definitely further your
career as a Data Ninja.
Data warehousing concepts are vital to learn as they provide the full picture of
data through its life cycle from creation, movement, storage and consumption.
Dimensional model may be used for any reporting or querying of data
The data warehouse provides an environment separate from the operational
systems and is completely designed for decision-support, analytical-reporting, adhoc queries, and data mining.

Computers are critical component of all our lives. Most things we interact with in the
world today are now run directly or indirectly by computer systems. As a result, it's
become crucial than ever for everyone (young and old) to learn programming or at least
understand the concepts.
Bill Gates and Mark Zuckerberg recently donated ten million dollars to, a
non-profit that believes that every student in every school should have the
opportunity to learn computer programming, and that computer science should be
a part of the core curriculum.
Coding is not a goal. Its a tool for solving problems. Learning to program teaches
computational thinking and Computational thinking teaches people how to tackle large
problems by breaking them down into a sequence of smaller, more manageable
You Can Play God
When you program, you are a creator. You go from a blank text file to a working
program with nothing to limit you but your imagination (and maybe some issues like

how long your program takes to run). Programming is like having access to the absolute
best set of legos in the world in almost unlimited qualities. Even better, you can get all
of your building materials completely for free (once you own a computer) on the
internet. Amazing!
It's also great fun to see someone using something that you made. Your ability to
improve your life and the lives of your friends and family is limited only by your ideas
once you can take full control of your computer. Moreover, your work can be extremely
high quality because the limiting factor is not manual dexterity or other non-mental
attributes. If you can understand a programming technique, you can implement and use it.
In general, programming is defined as the vocabulary and set of grammatical rules for
instructing a computer to perform specific tasks.
Programming is the process of designing, writing, testing, debugging, and maintaining
the source code of computer programs. This code can be written in a variety of
computer programming languages. Some of these languages include Java, C, and Python.
Computer code is a collection of typed words that the computer can clearly understand.
Just as a human translator might translate from the English language to Spanish, the
computer interprets these words as ones and zeros. We as humans use programming
languages, instead of writing directly in ones and zeros, so we can easily write and
understand the computer code and can organize it. We can think of the different lines of
our code as being individual instructions that we give to the computer. The computer
follows these instructions explicitly to execute our written code.
Programming is highly detailed work, and it usually involves fluency in several
languages. Projects can be short and require only a few days of coding, or they can be
very long, involving upward of a year to write.
Many reasons can be given as to why it is important to learn programming. But what
is most important of all of the reasons that can be provided is the attitude embodied by
most programmers. Programmers use their skills to primarily discompose and solve
complex and challenging problems.
It is often said that some people, when faced with a challenging situations, throw their
hands up in surrender and run away. Others, when faced with similar challenging
problems, will set about trying to break down the problem into subsets and work on it
until they understand what is going on. The latter are those who make for good

programmers. They solve challenging problems and they like doing it.
What Experts Say About the Mastery of Programming Skills
Coding isnt particularly easy to learn but thats exactly why its so valuable. Even if
you have no plans to become a software developer, spend a few weeks or month
learning to code and I can guarantee it will sharpen your ability to troubleshot and
solve problems.
(DIY Genius)
A deep understanding of programming, in particular the notions of successive
decomposition as a mode of analysis and debugging of trial solutions, results in
significant educational benefits in many domains of discourse, including those
unrelated to computers and information technology per se.
(Seymour Papert, in "Mindstorms")
It has often been said that a person does not really understand something until he
teaches it to someone else. Actually a person does not really understand something
until after teaching it to a computer, i.e., Express it as an algorithm.
(Donald Knuth, in "American Mathematical Monthly," 81)
Computers are not sycophants and won't make enthusiastic noises to ensure their
promotion or camouflage what they don't know. What you get is what you said.
(James P. Hogan in "Mind Matters")
I think everybody in this country should learn how to program a computer because
it teaches you how to think.
(Steve Jobs)
When you learn to read, you can the read to learn. And its same the thing with
coding: If you learn to code, you can the code to learn.
(Mitch Resnick)
Work your way up the programming ladder
As you work with data and mature within the data analytics space, inevitably you
might progress from working with small data in spreadsheets to crunching Big Data with
tools like Hadoop and Map Reduce and then maybe onto being a Data Scientist. In such
roles, the need for programming becomes even more paramount.
But, when we talk about programming, it does not have to be fancy or complicated. It
can be as simple as creating simple routines, or scripts, or workflows to automate
mundane tasks, such as moving files, searching folders, merging data sets, creating new
datasets, de-duplicating datasets, standardizing datasets, etc.
So start small and walk your way up the ladder by continuously practicing and

developing your skills.

As a novice to programming, you can start simple. The journey of a thousand miles
begins with the first step.
The most common question asked by anybody new to computer programming is What
language is the best to start with?. Many people will tell you to jump straight into it by
learning a more advanced language such as C++ or Java, others will tell you to start
with a more dated language such as C. In my personal opinion, the best programming
language to begin learning is Visual Basic .NET. VB.NET is a really good language to
learn for a beginner because it requires no previous experience in programming. The
Syntax used in VB.NET is simple and very easy to understand. Learning Visual Basic
will give you a basic understanding of how computer programming works and is also
really entertaining! Although VB.NET is a good place to start, I would not recommend
using it for too long. More advanced languages have a more advanced syntax and
spending all of your time using VB.NET could make it harder to move onto the more
advanced languages in the future.
Although every programming language has a different syntax, most programming
languages are similar. The first language that you learn will be the hardest language that
you learn because the concept will be new to you. After learning your first language,
you will have an understanding of how computer programming works and that will help
you a lot when it comes to learning other languages. If you chose a language such as
C++ with a more complicated syntax then it is going to be very confusing and hard for
you to understand if you do not have any prior experience. The first language that you
choose to learn is completely your choice, but we strongly recommend that you begin
with VB.NET.
As the excerpt from article articulates, the first step to
programming may entail choosing a language and then writing a simple Hello, World!
program. Its that simple.
From there you can progress to understanding more complex concepts, such as
language syntax, operators, variables and assignments, data types, flow controls, arrays
and iterators, etc.
With the simpler concepts mastered, depending on the programming language, you
can then progress to other concepts such as classes, objects, methods, instances and
instantiation. Eventually you can move on to more advanced concepts like threads,
concurrency, etc.
Practice Makes Perfect

I must acknowledge that getting into the programming game can prove to be a
challenge and poses a serious learning curve for non-programmers. But, I would
encourage anyone looking to take that step to not be intimidated by the process.
The one important thing Ive come to realize is that when learning to program, as with
any other thing we learn in life, we dont start off by being experts.
It takes practice, courage, determination and then some more practice in order to
succeed. I wish I could say it otherwise, but there is simply no way of getting around the
practice part of it. So, go out and start practicing.
Programming Success
o Get interested in programming, and do some because it is fun. Make sure that it
keeps being enough fun so that you will be willing to put in your ten years/10,000
o Program. The best kind of learning is learning by doing. To put it more
technically, "the maximal level of performance for individuals in a given domain is
not attained automatically as a function of extended experience, but the level of
performance can be increased even by highly experienced individuals as a result of
deliberate efforts to improve." (p. 366) and "the most effective learning requires a
well-defined task with an appropriate difficulty level for the particular individual,
informative feedback, and opportunities for repetition and corrections of errors."
(p. 20-21) The book Cognition in Practice: Mind, Mathematics, and Culture in
Everyday Life is an interesting reference for this viewpoint.
Talk with other programmers; read other programs. This is more important than
any book or training course.
If you want, put in four years at a college (or more at a graduate school). This will
give you access to some jobs that require credentials, and it will give you a deeper
understanding of the field, but if you don't enjoy school, you can (with some
dedication) get similar experience on your own or on the job. In any case, book
learning alone won't be enough. "Computer science education cannot make
anybody an expert programmer any more than studying brushes and pigment can
make somebody an expert painter" says Eric Raymond, author of The New
Hacker's Dictionary. One of the best programmers I ever hired had only a High
School degree; he's produced a lot of great software, has his own news group, and
made enough in stock options to buy his own nightclub.
Work on projects with other programmers. Be the best programmer on some
projects; be the worst on some others. When you're the best, you get to test your
abilities to lead a project, and to inspire others with your vision. When you're the
worst, you learn what the masters do, and you learn what they don't like to do
(because they make you do it for them).

Work on projects after other programmers. Understand a program written by

someone else. See what it takes to understand and fix it when the original
programmers are not around. Think about how to design your programs to make it
easier for those who will maintain them after you.
Learn at least a half dozen programming languages. Include one language that
emphasizes class abstractions (like Java or C++), one that emphasizes functional
abstraction (like Lisp or ML or Haskell), one that supports syntactic abstraction
(like Lisp), one that supports declarative specifications (like Prolog or C++
templates), and one that emphasizes parallelism (like Clojure or Go).
Given that this book is ultimately about informing you on importance of learning
programming and being able to use it in your role as a data ninja when analyzing data,
weve pulled together survey results of some of the popular programming languages in
the industry to show how they all stack up in terms of popularity.

Which programming tools to use for data analysis. (kdnuggets)

Programming requires a very rich and diverse set of skills to master, and there are
many programming concepts out there one can potentially learn. But you probably do not
have to go out with the aim of learning every single programming concept there is to
learn that simply isnt possible and wont even make any sense at all.
The main thing that would be needed is an understanding of the concepts and how to
apply those concepts to solve business problems as opposed to getting locked down
in a particular programming language or syntax.
Once you have mastered the fundamental concepts of programming, you can then
apply them to solve specific business problems regardless of which language or
vendor product you use.

Comparing programming to some physical tasks, programming does not require some
innate talent or skill, like gymnastics or painting or singing. You don't have to be strong
or coordinated or graceful or have perfect pitch. Programming does, however, require
care and craftsmanship, like carpentry or metalworking. If you've ever taken a shop
class, you may remember that some students seemed to be able to turn out beautiful
projects effortlessly, while other students were all thumbs and made the exact mistakes
that the teacher told them not to make. What distinguished the successful students was
not that they were better or smarter, but just that they paid more attention to what was
going on and were more careful and deliberate about what they were doing.
The point of learning to program or learning programming concepts is not necessarily
to label yourself as a programmer. Its about learning how to tackle and solve
complex problem.
Its understandable that some readers with less technical inclinations would be
scared of being called a Programmer, instead they might prefer Analyst or some
other title of their choosing. But, Mitchel Resnick of MIT media labs put it rightly when
he said, coding is a gateway to broader learning.
Once you have mastered programming, it can provide you with the means to think
creatively, reason systematically and work collaboratively to solve countless other
problems. This is an exciting proposition, and that is why learning to code or read code
is highly recommended, and will help you not only excel, but thrive as a data ninja.
Good coders are a special breed of persistent problem-solvers who are addicted to the
small victories that come along a long path of trial and error. Learning how to program
is very rewarding, but it can also be a frustrating and solitary experience. If you can, get
a buddy to work with you along the way. Becoming really good at programming, like
anything else, is a matter of sticking with it, trying things out and getting experience as
you go.
Programming is a lifelong endeavor. It is like learning a language. And like learning
any (spoken) language for the first time, it is crucial you use it frequently to remain
The resources and discussions provided in this nugget will help you get started with a
solid career in programming. But it is imperative that you practice your new skills every
day to keep it sharp and keen.


Codecademy is arguably the most well-known website on this list for learning to
program online. They offer courses in Web Fundamentals, PHP, JavaScript, jQuery,
Python, Ruby, and APIs. You can track your progress and learn interactively. The site is
well put together; and is a great place to start.
EdX connects students with the highest quality education, through their institutional
partners. They have a huge catalog of different categories. They offer courses in almost
anything you can think of. This is a great resource for anyone trying to expand their
knowledge in different sciences.
Code Avengers
Code Avengers is elegantly designed to make the learning process fun. Every course
they offer is strategically designed to entertain and delight while also educating. Code
Avengers offers small mini games after each lesson to help keep your mind relaxed and
focused. It is very easy to lose track of time while studying these courses because they
are so entertaining. If you have trouble staying focus Code Avengers is perfect for you.
ilovecoding strives to turn beginners into confident developers who can solve any
programming task. They offer video tutorials in JavaScript, Angular JS, jQuery, Node
JS, and HTML5/ CSS. Their courses are designed to be completed within one or two
Code School
Code school is geared towards those who already have a good understanding of
programming and want to go more in-depth. Not only do they offer programming
courses they go over concepts like the industrys best practices to keep you ahead of the
curve. They offer courses in Ruby, JavaScript, HTML/CSS, and iOs.
Bento offers free tutorials and for a fee they will help you along your path to learn
programming. Bento picks the best free tutorials and guides you through them. An
interesting site, worth checking out but you will probably want to also visit some other

sites to get a full experience.

Khan Academy
The Khan Academy offers a vast array of courses that cover everything from, coding,
calculus, to computer science. People from all around the world use Khan Academy and
all join together to create an astounding online community.
Coursera is a huge online institution that offers courses from some of the top
universities. Coursera is available in five different languages including, English,
Spanish, French, Italian, and Chinese. This is a great way to get a top of the line
education for free.
Google University
Google has put together a catalog of online resources to learn an array of things. Google
University has just recently gone from their beta stage to being live to the public and is
still in its early years. This means, it is still growing and will continue to grow. Google
is a very reliable company, but since this is not their main area of focus, beginners might
find this resource a little unapproachable.
Udacity is designed to model universities with online videos lead by industry leaders.
They have some huge giants helping them make their videos, Google, AT&T, Facebook,
Salesforce, Cloudera, just to name a few. Udacity offers a Nanodegree and
credenitals to give you something to put into your resume.
Code Combat
A lot like Code Avengers, Code Combat is designed to teach while the user plays a fun
game. It is geared towards beginners and instantly throws you into a game where you
are writing code. The people at Code Combat really tried to make this game addicting
so you forget you are learning to code while you are playing. It can seem like its for
children but just because you are an adult doesnt mean you cant have fun while
learning too.
The Odin Project
The Odin Project is still in its beta release but everything seems to be running

immaculately. The people who started The Odin Project felt there was a hole in the
market and they wanted to create something that would gap that hole. They offer
resources in web development, Ruby programming, Ruby on Rails, HTML5/CSS3,
JavaScript, jQuery, and offer discussions on how to get hired as a web developer.
Quakit offers free web tutorials, codes, templates, and tools. Its a very friendly site
with resources in HTML, CSS, coding, databases, web hosting, and XML. Each
category offers many tutorials for you to read through.
Saylor Academy
Saylor Academy is a non-profit online academy. They have designed their site after
universities. They name their courses just as a college would (e.g. arth110). Once you
sign up it will feel like you are taking an online class at your local college, however
they are not accredited and you will not receive a diploma. They offer courses in arts
and sciences; this is a great resource for anyone who wants to further their knowledge in
any subject.
Learn Python the Hard Way
Ive listed Learn the Hard Way platform elsewhere, but again dont let the name scare
you off. Their online books are very approachable but force you to type every lesson
yourself so you cannot take any short cuts. The only hard part is the self-discipline. The
cost of the online book is $29.
Code Mentor
Code Mentor is a unique experience that connects you with real people who specialize
in in coding. Codementor connects you with experts for instant problem solving,
technical advice, pair programming, and code review. Every mentor sets their own rate
so the prices vary from mentor to mentor.
Career Foundry
Career Foundry is geared towards beginners with no previous experience. They only
offer two courses, web developer and user experience designer.
BaseRails is a project-based learning site. Learn ruby on Rails along with other web

technologies. Rather than simply teaching you code they will guide you through building
a specific site, such as, review sites, market places, data collection, and classified type
Coder Camps
Coder Camps is a very intricate site that offers very good courses. Coder Camps offers
courses in .NET, JavaScript, iOS, and HTML/CSS. You can apply for a scholarship to
help pay for your tuition. There is a large tuition fee from $9,000 to $12,000, they offer
a deferred payment option where you pay $1,000 down payment then pay off the rest
after completion. Its hard to recommend a site with such a high fee when there are so
many great learning resources online with no costs. If you have the money then this site
will offer you a lot more than the free sites but dont worry if you dont have the money,
there are some great free sites that Ive already listed.
Code School
Code School is a little less expensive than Coder Camps but with some great options.
They offer courses in ruby, JavaScript, HTML/CSS, iOS, Git, and many more. Their fee
ranges from $29 monthly or $290 annually.
Hack Reactor
Hack Reactor is an online immersive coding program. They have built their program off
classroom based learning. They have signups for their classes and if you miss the signup
you will have to wait for the next one. You will be a student in a class with an instructor
who can help you one-on-one if needed.
The courses at Treehouse are great for the novice programmer. Unlike most of the other
sites listed here Treehouse is project-oriented and will help you prepare for most
projects you have planned.
BLOC is an online boot camp with programs in web development, mobile development,
and design. BLOC has designed a structured program that is immersive but still can fit
into your busy schedule. They offer a very structured track of courses that vary from 40
hours, 30 hours, and 15 hours.

Learnable offers over 500 video tutorials with unlimited online access to all of them.
They offer help in HTML/CSS, JavaScript, PHP, Ruby, Design/UX, Mobile OS, and
Workflow. When you subscribe to their service youll also gain access to a huge eBook
library available online or download them and put them on a tablet device to read them
Thinkful is another mentor based site that connects you with experts in your field of
study and offers one-on-one help. These mentor sites are great if you are working on a
project and get stuck. They can look at your code and help you debug the problem.
Below is a list of resources that are aimed towards kids. These resources understand
how younger brains work and have designed their courses to be engaging and fun for
kids to learn how to code.
A simple name with extraordinary results. The two Partovi brothers created this nonprofit organization to encourage school students to learn computer science. This is a
very well thought out site that aims to reach the minorities inside the programming
CS Unplugged
CS Unplugged takes a different approach to teaching programming. Rather than having
online courses they offer a free PDF book that is a collection of games, puzzles, cards,
and activities that students can use without a computer to learn computational thinking. It
wont take someone from beginner to master coder but it will get kids to start thinking in
a way that will set them up for success later in life. You can also purchase a physical
book from their partner website for $20.
Programming is becoming essential for everyone in the modern world. There are so
many resources to learn programming or sharpen your skills, and we hope you find the
ones that have been included in this nugget to be valuable. Learning to program
ultimately is like learning a language, and you must use it every day to stay fluent.
Get beyond the fear of reading or writing code and start getting familiarized with

the concepts. Programming syntax and languages may change, but the concept of
using programming techniques to solve particular business problems do not change
for the most part.
Remember that as we learn to read, so we can read to learn so must you learn to
code, so you can code to learn.
The choice is yours to make. Start simple! Pick a language that suites your needs
and practice with it in order to build competence. Theres no easy way to get
around the practicing part.

Looking back 20 years or so, cassette tapes, walk mans, and floppy disks were the
norm, and every cool kid on the block wanted to own one. But, these once "cool"
technologies of the 1980s and 1990s bears almost no resemblance to what we have
In the same way, our jobs and organizations of today probably bear little resemblance
to that time. Or, lets play that forward, and look ahead 20 years from now. For one, we
can guarantee that things would have changed and would not be same as they are today.
In that scenario, we see that new gadgets would have sprung into existence,
companies would have upgraded their platforms and tools, and the way we do business
or interact with each other would have changed.
Dealing with all of this change can be daunting. Yet being able to do so is vital to
your career as a successful Data Ninja.
Making predictions about the future can be hard, especially when the changes happen
almost on a daily basis. In the industry, the Gartner Technology Hype cycle is the leader
in making predictions about what technology tools would live or flop in coming years.
Mind you, it's only a prediction - not a declaration.

For over the 10 years it's been published, they have over time added a comprehensive
range of hype cycles covering technology applications like Ecommerce, CRM, ERP and
Business Intelligence. (Many of their predictions are only available to subscribers, but
Gartner do share some of the broader hype cycles through their blog/press releases.)
Gartners Hype Cycle for Emerging Technologies Maps the Journey to Digital Business

Garters Hype Cycle (gartner)

In order to come up with their hype-cycle, Gartner examined more than 2,000
technologies on their maturity, their business benefits and their future orientation.
The Hype Cycle they provide from the results of this evaluation is especially relevant
for review because it offers a cross-sectoral perspective of those technologies and
trends that senior executives, CIOs, innovators, and technology planners should
consider in the preparation of strategic technology portfolio.
In the recent hype cycle they released (shown in the figure above) we see three big
predictions stand out all promising to be the three most important strategic technology
trends in the coming years.
Computing Everywhere
The Internet of Things (IoT)
Big Data

Predictive analytics is one area of analytics that holds a lot of promise and is even
promising to be the next frontier in Data Analytics. Open source tools and solutions like
Spark, Mahout, and R are a few of the top contenders in this area.
Proprietary predictive analytics solutions like AzureML, Angoss Predictive
Analytics, RapidMiner, SAS Analytics, IBM Analytics and SAP analytic offers more
options companies can choose from.
Why is Predictive Analytics Important?
The ability to predict the future and influence it is a lucrative opportunity and companies
such as IBM and SAP are great examples of organizations that adopt this initiative. IBM
uses predictive analytics software to increase profitability, prevent fraud, and even
measure the social media impact of marketing campaigns. SAP allows customers to act
on big data and offers insights on new opportunities and any hidden risks. Predictive
analytics also extends beyond these two companies to various industries some of which
are listed below.
Predictive analytics is a very important capability companies are looking to have
now. With Predictive analytics, organizations can use historical performance data to
extrapolate and make predictions about the future. If the predictions are right, the
company can then take actions to influence the outcomes in their favor.
Today, we companies rushing to get ahead of the curve and incorporate predictive
analytics models as part of their data analytics processes. Gartner substantiates this
point by making the claim that by 2016, 70% of the Most Profitable Companies Will
Manage Their Business Processes Using Real-Time Predictive Analytics or Extreme
Collaboration. (Gartner)
Examples of the use of predictive analytics are legion. Retailers such as supermarket
chains use the concept to analyze current and historical sales data. Using predictive
analytics, they can see and identify patterns in customer behavior and use these patterns
of behavior to predict what products customers are most likely to buy.
Banks and financial institutions use predictive analytics to forecast the likelihood of a
customer defaulting on loans and health insurance companies using predictive analytics
methods to screen members and establish which claims are most likely to be bogus or
even fraudulent.
All these examples we see on predictive analytics help to reinforce the fact that

Predictive Analytics is no longer science fiction it is something that is actually

happening, and more than likely will be coming to a company near you.
Recently in the news, there was a story of data analytics teams at Target Corporation
being able to figure out, through the infinite wisdom of predictive analytics, that a
shopper at one of their stores was pregnant before the father of the shopper even knew
How Target Figured out a Teen Girl was Pregnant before the Father
Every time you go shopping, you share intimate details about your consumption patterns
with retailers. And many of those retailers are studying those details to figure out what
you like, what you need, and which coupons are most likely to make you happy. Target,
for example, has figured out how to data-mine its way into your womb, to figure out
whether you have a baby on the way long before you need to start buying diapers.
How Did They Do It?
[The analyst who implemented the solution] ran test after test, analyzing the data, and
before long some useful patterns emerged. Lotions, for example. Lots of people buy
lotion, but one of Poles colleagues noticed that women on the baby registry were
buying larger quantities of unscented lotion around the beginning of their second
trimester. Another analyst noted that sometime in the first 20 weeks, pregnant women
loaded up on supplements like calcium, magnesium and zinc. Many shoppers purchase
soap and cotton balls, but when someone suddenly starts buying lots of scent-free soap
and extra-big bags of cotton balls, in addition to hand sanitizers and washcloths, it
signals they could be getting close to their delivery date.
Even More of this Target Story
As Poles computers crawled through the data, he was able to identify about 25
products that, when analyzed together, allowed him to assign each shopper a
pregnancy prediction score. More important, he could also estimate her due date
to within a small window, so Target could send coupons timed to very specific stages
of her pregnancy.
This solution developed by Target certainly underscores the enormous benefits of
Predictive Analytics solutions, and this is not all theory or fun and games.
Solutions like this produce real results that positively influence the bottom line of
For example, the Forbes article that carried the Target story goes further to call out
the direct dollar figures that were attributed to the predictive analytics solution and
lets just say it was in the Bs (as in billions of dollars)!

Dollar Value of Targets Predictive Analytics

Duhigg suggests that Targets gangbusters revenue growth $44 billion in 2002,
when Pole was hired, to $67 billion in 2010 is attributable to Poles helping the
retail giant corner the baby-on-board market, citing company president Gregg
Steinhafel boasting to investors about the companys heightened focus on items and
categories that appeal to specific guest segments such as mom and baby.
From the Target example, we see over 23 billion dollar of growth in sales from 2002
to 2010 that is attributed to the predictive analytics solutions.
That is a lot of money that could potentially turn the fortunes of many companies
around. More importantly, such positive results can make you the Data Ninja who
made it all happen the number one super hero of the company
And that is why Predictive analytics and the potential value it can deliver is a trend
worth paying attention to going forward.
I cannot predict the future of data analytics, but if history is any good teacher, I can say
for sure that things will change and be different than what we are used to today. Change
is inevitable, and comes in two main flavors evolutions and revolutions.
Most of us are accustomed to evolutionary changes. They are slow, take time, are less
noticeable, but still happen. For example, going from SQL 2000 to SQL 2008 or SQL
2014 was an evolutionary process. New features were added to the base product and all
SQL server professionals had to do was to mature and update their skill sets along the
Going from MS Excel 2000 to Excel 2014 was evolutionary, with new features being
added incrementally along the way including the PowerBI stack that consists
PowerPivot, PowerView, PoweQuery, PowerMaps, and so on.
The changes as we see in SQL or Excel are evolutionary by nature they come
often, move us forward, and we can deal with them through upgrades and by reading a
few new books.

Revolutionary Change. (HTB)

Revolutionary changes, on the other hand, are the ones that come much less often, and
when they do, they fundamentally change the affected industries and rock it from top to
Think about inventions such as paper currency, the light bulb, automobile, anti-biotic,
transistors, micro-processor and more. Without these inventions, we probably wont be
where we are today as a society.
Even in the world of working with data, we see these kinds of changes on the horizon
of how we collect, store and process data.
Today in the mainstream, we think in terms of databases, with tables, rows and
columns when storing data. Tomorrow, that concept of databases as we know it may
change, replaced by entirely new constructs forcing us all to eschew what we
previously knew and adopt the new ones.
Tools like Graph databases, NoSQL, Hadoop, Spark, and many other products are all
standing on the cutting edge, promising to fundamentally change the field of data
analytics as we know it.
Whether or not these technologies will actually deliver on their promises and the
hype surrounding them, is not a prediction Im comfortable making.
But one way or the other, things will change, including the tools and techniques we
currently use for data analysis.
Change is constantly around us. Sometimes it is so minute and consistent that we do

not notice it, while other times it is so severe and sudden that it bowls us over.
As Darwin famously wrote in his book, The Origins of Species:
It is not the strongest of the species that survives, nor the most intelligent that
survives. It is the one that is most adaptable to change.
--Charles Darwin
The field of data analytics is constantly changing and no one (including me) can make
a prediction of what tools or processes will be used 5, 10 or 20 years from now. At the
very most, all we can do is speculate.
But despite the certainty of the inevitability of change, that fact alone should not
paralyze or prevent us from being effective data ninjas. What it should serve to do is to
make us prepared and ready to adapt to new technologies, tools or practices the future
throws at us.
As a data Ninja, if you can learn how to stay relatively unaffected by change, handle
new technological developments with confidence, and adapt to any curve balls that
come your way, then you will stand the test of time and have a better time maturing with
your career in the profession.
Ninja Tip 12.
In his classic work The Art of War, author and military strategist Sun Tzu wrote about
the importance of observing signs of the enemy.
In it, he wrote that movement among trees in a forest indicates an advancing enemy
brigate, and that dust that rose in a high column indicated the approach of chariots.
In same token of observing signs, its important that you find and pay attention to such
vital signs in your career. It would help you know when or when not to make critical
career decisions.
The Tech world is a fast paced, ever changing, living organism. New advancements can
make a whole field obsolete overnight. Startup companies are popping up everywhere,
offering a vast array of products, services, and technologies. It is crucial to stay current
and on top of how things are changing, to keep yourself competitive and relevant.
How Technology Is Transforming Our Brains
Published, 2013
Top 5 Reasons why software professionals need social skills, too
Published, 2011

10 highly valued soft skills for IT pros
Published, 2013
7 Simple Ways to Stay Current on Technology
Published, 2012
8 Ways to Advance Your Career by Staying Relevant
Published, 2012
6 Ways to stay Current in Your Field and Advance
Published, 2010
As you can see some of these articles are a few years old, which in the Tech world
could means they are obsolete. But efforts have been made in selecting the articles
listed to ensure their relevance.
To stay informed and be able to evolve in the field of data analytics, you need to keep
reading and learning new concepts and techniques. Below is a list of my favorite Tech
blogs and websites that can be a valuable source of news and updates for you to learn
and stay informed on technologies, especially relating to general trends within the
industry. I have tried to only include sites that have a stable revenue and user base to
ensure that they will be around for years to come.
ZDNet was founded in 1991 and acquired by CNET in 2000. ZDNet publishes product
reviews, software downloads, news, analysis, and guides.
Gigaom was created in 2006 by Om Malik. They devote all their efforts into finding the
newest and best in tech. News and analysis on web 2.0, startups, gaming, social media,
and everything else tech. With over 6.5 million unique visitors every month Gigaom is
trying to humanize technology and make it approachable for everyone.

Mashable reports on the importance of digital innovation. Mashable has over 42

million unique visitors monthly Mashable is truly a powerhouse. They report on
everything tech from social media, entertainment, news, startups, and anything techies
are talking about.
Wired is a full-colored monthly magazine based in the United States. They report on
emerging technologies, economics, and politics. Their magazine is full of interesting
thought provoking articles that will inspire and amaze. Their website offers free articles
and news. They cover absolutely everything any tech savvy person could care about.
Subscribe to their magazine and youll learn something new every time you pick it up.
TechCrunch is one of my favorite tech sites to visit on a daily bases. Like wired above
they cover almost everything youll need to stay current in the world. They do not limit
themselves to tech; they delve into politics and worldwide news. Founded in 2005 they
have grown immensely.
This is the most technical and Data oriented of all the resources listed in this nugget.
Datatau is like Hackernews for data science. The simple interface feels that they are just
a list of articles for bigdata/data scientists. But the quality of content that can be found
there is great.
This list is just the crust of tech related blogs and sites. If you visit any one from the list
above you will discover more blogs and affiliates expanding your knowledge. Keep
searching and exploring for new technologies and new skills while adding to the wealth
of skills youve gain from reading this book.
Change requires flexibility. The better able you are to adapt to change, the greater
your chances of being successful.
Enjoying success requires the ability to adapt. Only by being open to change
will you have a true opportunity to get the most from your talent. --Nolan
Stay curious and adapt. Change is the only thing that will remain constant.

As we have seen throughout; data volumes continue grow at mind blowing rates, the
demand for data crunching professionals is off the roof, the pay for quality talent is
astonishingly lucrative, the entry barriers for beginners is remarkably easy. So, what are
you waiting for? The choice is yours, to get in, and get started.
Many companies increasingly depend on their Data Analysts to crunch numbers, but
they depend even more on the Data Ninja-type professionals like you, who can go
beyond the basic aspects of crunching numbers and understand the subtle nuances of the
These are Ninjas who can see the big picture and perform analysis or make
predictions in ways that positively affect the bottom line of their companies.
Stuff to Blow Your Mind
To conclude, I would leave you with a few real world stories of high performing
companies that are making the best of the data boom.
The excerpts presented below is a list of companies that are leveraging the
tremendous powers of Data Analytics and are positively affecting their bottom-lines in
the process.
The point of doing this, is to have you consider yourself as being the data ninja in
charge (or one of the data ninjas in charge) who helped make that happened. And then,
in that situation, also consider what that would mean for your career, your ambitions,
your goals and above all, your pocket book, or take home pay check.
IBMs work has revealed genetic traits of cancer survivors, tracked the source of an E.
coli outbreak. It recently created a visualization to help the influential Washington,
D.C.based think tank, Institute for the Study of War, map terrorist behavior in and
around Baghdad during a campaign to free imprisoned Al Qaeda members.
By analyzing the behavior patterns of its digital and mobile users in 3 million locations
worldwidealong with the unique climate data in each localethe Weather Company
has become an advertising powerhouse, letting shampoo brands, for example, target
users in a humid climate with a new antifrizz product. Its no surprise that more than half
of the Weather Companys ad revenue is now generated from its digital operations.
Evolvs data scientists have uncovered: People with two social media accounts perform
much higher than those with more or less, and in many careers, such as call-center work,
employees with criminal backgrounds perform better than those with squeaky-clean

records. Evolvs sales grew a whopping 150% from Q3 2012 to Q3 2013.

Over the past year, General Electric has taken the lead in tying together what Chairman
Jeff Immelt calls "the physical and analytical worlds." Translation: GE's many machines
everything from power plants to locomotives to hospital equipmentnow pump out
data about how they're operating. GE's analytics team crunches it, then rejiggers
machines to be more efficient. Even tiny improvements are substantial, given the scale:
By GE's estimates, data can boost productivity in the U.S. by 1.5%, which over a 20year period could save enough cash to raise average national incomes by as much as

Atomicity, Consistency, Isolation and Durability
Business Intelligence
Confirmatory Data Analysis
Customer Relationship Management
Comma-separated Values
Database Administrator
Data Control Language
Data Definition Language
Data Information Knowledge Wisdom
Data Modification Language
Data Warehouse
Exploratory Data Analysis
Entity Relationship Models
Extract Transform Load
Hadoop Distributed File System

Key Performance Indicator

Line of Business
Not Only SQL
Operational Data Store
Online Analytical Processes
Online Transactional Processes
Relational Database Management System
Statistical Analysis Software
Structured Query Language
Transaction Control Language
Extensible Markup Language

Glossary of Definitions
Data Ninja
A Data Ninja is an entry level, unspecialized, entrepreneurial individual that works
within a structured environment (usually within a company or team), employing a
variety of tools and performs a variety of tasks related to collecting, organizing, and
interpreting data to gain useful information.
Data Analysis
Data analysis is the process of finding the right data to answer your question,
understanding the processes underlying the data, discovering the important patterns in
the data, and then communicating your results to have the biggest possible impact.
Big Data
Big data is an evolving term that describes any voluminous amount of structured, semistructured and unstructured data that has the potential to be mined for information.
Although big data doesn't refer to any specific quantity, the term is often used when
speaking about petabytes and exabytes of data.
DIKW Pyramid
The DIKW Pyramid, also known variously as the "DIKW Hierarchy", is a model used
for representing structure and functional relationships between data, information,
knowledge, and wisdom. In the Model, information is defined in terms of data,
knowledge in terms of information, and wisdom in terms of knowledge.
A database (abbreviated DB) is a collection of information that is organized so that it
can easily be accessed, managed, and updated. A database basically helps to organize a
collection of information in such a way that a computer program can quickly select
desired pieces of data. Many datasets in companies are stored and manipulated in
Data Warehousing
In computing, a data warehouse (DW or DWH), also known as an enterprise data
warehouse (EDW), is a system used for reporting and data analysis. DWs are central
repositories of integrated data from one or more disparate sources.
Dimensional Modelling
Dimensional modeling (DM) names a set of techniques and concepts used in data
warehouse design. Many dimensional models typical consists of fact tables and lookup
Predictive Analytics
Predictive analytics is the practice of extracting information from existing data sets in

order to determine patterns and predict future possibilities and trends. It doesn't dictate
the future, but it forecasts what might happen in the future with an acceptable level of
reliability, and includes what-if scenarios and risk assessment.
Programming is the process of taking an algorithm and encoding it into a notation, a
programming language, so that it can be executed by a computer. Programming involves
activities such as analysis, developing understanding, generating algorithms,
verification of requirements of algorithms including their correctness and resources
consumption, and implementation (commonly referred to as coding) of algorithms in a
target programming language. Programming can be done in many different programming
languages (such as C, FORTRAN, JavaScript, Lisp, Python, Ruby, Smalltalk, etc.)

Relevant Quotes Glossary

* A point of view can be a dangerous luxury when substituted for insight and
understanding. Marshall McLuhan, Canadian Communications Professor
* In God we trust, all others must bring data. W. Edwards Deming
* I never guess. It is a capital mistake to theorize before one has data. Insensibly one
begins to twist facts to suit theories, instead of theories to suit facts. Sir Arthur Conan
Doyle, Author of Sherlock Holmes stories
* He uses statistics as a drunken man uses lamp posts for support rather than for
illumination. Andrew Lang, Scottish Write
* Facts do not cease to exist because they are ignored. Aldous Huxley
* A reminder to be careful in your analysis and dont stretch to get the results youd like
If you torture the data long enough, it will confess. Ronald Coase, Economist
* Errors using inadequate data are much less than those using no data at all. Charles
* Once we know something, we find it hard to imagine what it was like not to know it.
Chip & Dan Heath, Authors of Made to Stick, Switch
* If the Statistics are boring, you've got the wrong numbers. Edward Tufte
* Data are just summaries of thousands of stories tell a few of those stories to help
make the data meaningful. Chip & Dan Heath, Authors of Made to Stick, Switch
* Numbers have an important story to tell. They rely on you to give them a clear and
convincing voice. Stephen Few
* The goal is to turn data into information, and information into insight. Carly Fiorina,
Former CEO of HP
* There is a magic in graphs. The profle of a curve reveals in flash a whole situation
the life history of an epidemic, a panic, or an era of prosperity. The curve informs the
mind, awakens the imagination, convinces. Henry D. Hubbard, 1939
* People take good care of data that is important to them Data that is loved tends to
survive. Kurt Bollacker, Data Scientist, Freeba
* If we have data, lets look at data. If all we have are opinions, lets go with mine. Jim
Barksdale, former Netscape CEO

(Information Governance Initiative), IGI. "" 20 March 2015. "IS THE BIGGEST RISK OF BIG DATA
THE INABILITY TO EXTRACT VALUE?". Article. 25 April 2015.
Bain. "" n.d.
"The Value of Big Data: How Analytics Differentiates Winners.". Article. 24 March
Big Data Salary, "Inside Big Data Salary". n.d.
Table. 25 March 2015.
"" n.d. "How Do You Build a Culture of Innovation? - Yale Insights". Article.
25 April 2015.
Carroll, Jim. "Chapter 7 - Creating an Innovation Culture." Carroll, Jim. "What I
Learned from Frogs in Texas: Saving Your Skin with Forward Thinking Innovation.".
Mississauga, Ont.: Oblio, 2004. 81. Print.
"" 10 December 2013. Figure 1: A sign language interpreter
during a memorial service at FNB Stadium in honor of Nelson Mandela in Soweto, near
Johannesburg. Image. 20 April 2015.
"" n.d. "Program or Perish: Why Everyone Should Learn
to Code.". report. 11 april 2015. "" n.d. "Why
Learn to Program?" . Article. 11 April 2015.
Datanami. "" 14 October
2014. "Top Three Things Not To Do in Excel.". 11 April 2015.
Data-Warehouses. "" n.d.
"Dimensional Model.". Diagram. 25 March 2015.
Dice. 20 April 2015. Search. 20 April 2015.
"" n.d. "Meet the Youngest Video Game Programmer : DNews.". Article. 11
April 2015.
Doyle, Martin. ""What Is the Difference Between Data and Information?"." n.d.
Business 2 Community. Article. 25 March 2015.
"" n.d. "EarSketch." . quote. 11 April 2015.

Eridon, Corey. ""

11 June 2014. "The Problem With Predictive Analytics. Article. 20 April 2015.
Eskimo. "" n.d. "Skills Needed
in Programming.". Document. 11 april 2015.
Excel, Microsoft. Pivot Functions within MS Excel. 2015. Screenshot.
"" 10 February 2014. "The World's Top 10 Most
Innovative Companies in Big Data.". Article. 24 April 2015.
Forbes. "" n.d. "How Target Figured Out A Teen
Girl Was Pregnant Before Her Father Did.". Article. 19 April 2015.
Franks, Bill. "" 9 November 2011. "Analytics Gone Wrong: Dire Consequences for Kids".
Article. 24 April 2015.
Gartner. "Gartner Reveals Top Predictions for IT Organizations and Users for 2013 and
Beyond". 2013. Press Release. 25 March 2015.
. "Gartner Says by 2016, 70 Percent of the Most Profitable Companies Will Manage
Their Business Processes Using Real-Time Predictive Analytics or Extreme
Collaboration." Analisys. 2015. Report.
2014. chart. 2015.
"" n.d. "Analysis: The Exploding Demand for Computer Science
Education, and Why America Needs to Keep up". Chart. 11 April 2015.
"" n.d. "Analysis: The Exploding Demand for Computer Science
Education, and Why America Needs to Keep up ". Chart. 11 April 2015.
Gigaom. "" n.d. "Why Becoming a Data Scientist Might Be Easier than You
Think.". quote. 11 april 2015.
. "" 2015. "Microsoft Throws down the Gauntlet in Business Intelligence.".
11 April 5015.
GlassDoor. 20 April 2015. Search. 20 April 2015.
HowTo. "" n.d. "How To Start
Programming.". Document. 11 april 2015.

HTB. 2015.
IBM. 25 March 2015.
Article. 20 April 2015.
"" n.d.
"Data Analysis." Responsible Conduct of Research (RCR). Goverment Resource. 25
March 2015.
InformationBuilders. "" n.d.
"Data Warehousing (Data Warehouse) Solutions | Information Builders.". Article. 11
April 2015.
Jain, Piyanka. ""5 Steps To Transition Your Career To Analytics: Step 1 - Identify Your
Ideal Job."." Forbes Magazine (2015, Jan 5). Article.
"" n.d. "Poll Results: Top Languages for Analytics/data
Mining Programming.". poll. 11 april 2015.
Kearney, A. T. "Big Data and the Creative Destructive of Today's Business Models.".
n.d. Table. 26 March 2015.
Kimbal. ""
2 August 1997. "A Dimensional Modeling Manifesto - Kimball Group.". Article. 11
April 2015.
Kristal, Murat. ""Mining Mountains of Data is Key for Canadian Businesses"." 12
September 2012. The Globe and Mail. Article. 25 March 2015.
2015. "SQL Standardization | Online Learning.". Website. 20 April 2015.
Leek, Jeff. PhD. "" 2013. Johns Hopkins
Bloomberg School of Public Health: Data Analysis. Coursera Course. Webpage. 25
March 2015.
"" n.d. "Programmer 101: Teach Yourself How to Code.". instructions. 11 april
Longlivetheux. DIKW Pyramid, Wikipedia. 5 January 2015. DIKW Pyramid. 25 March
Marr, Bernard. "Big Data: Using Smart Big Data, Analytics and Metrics to Make Better
Decisions and Improve Performance." Marr, Bernard. Big Data: Using Smart Big Data,

Analytics and Metrics to Make Better Decisions and Improve Performance. n.d., 2015.
McGee, Marianne Kolbasuk. "" 23 January 2013. "Prison Time for Health Data Theft." Data Breach
Today. Article. 25 March 2015.
McKinsey. "Big data: The Next Frontier for Innovation, Competition, and Productivity."
Business Technology. 2011. Report.
"" n.d. "Manufacturing the Talent Shortage.". Document. 11 aprill 2015.
nde. "Data Warehouse ETL architecture." n.d. Chart.
NetworkWorld. "" n.d. "What's Better for Your
Big Data Application, SQL or NoSQL?". 11 April 2015.
NewYorker. "" n.d. "Do We Really Need to Learn to Code?" . quote. 11 april 2015.
NGGS. Nextgen Global Solutions. 2015. Chart. 25 March 2015.
Norvig. 2015. 2015.
"" May 2011. "Data Mining/Analytic Tools Used.". Poll. 11 April 2015.
Office, MS. "" n.d. "Use
the Analysis ToolPak to Perform Complex Data Analysis.". Support. 11 April 2015.
Oracle. "Big Data and the Creative Destructive of Today's Business Models". 2012.
Chart. 25 March 2015.
Robert Half Technology, 2015 Salary Guide. 2015. Table. 25 March
"The Business Impact of Data Breach.". Article. 26 March 2015.
james. 2013. 2015.
Somerville, Richard. Interview. American climate scientist. 2011. Quote.
Stadd, Allison. "Data Analysts: What youll make and where youll make it. ." 26
Movember 2014.
Web Article. 25 March 2015. "" 2015. "Become a Data

Analyst: Education and Career Roadmap.". Chart. 26 March 2015.

"" 9 September 2013. "Why Is
Predictive Analytics Important?" Business 2 Community. Article. 15 April 2015.
TechnologyCrowds. n.d. chart. 26 march 2015.
U.S. Census Bureau, Income and Poverty in the United States. U.S. Department of
Commerce Economics and Statistics Administration. 2013. Census. 27 March 2015.
n.d. "Career Opportunities: Data Analyst". 24 March 2015.
Workable. ""Data Analyst Job Description. Ready to Post and Easy to Customize."."
n.d. Job Description
Resources. 26 March 2015.
"" n.d. "Secure Removal
of Data or Disposal of Computing Devices.". Article. 24 April 2015.

About the Author

Fru Nde is a Data Professional who is very passionate about the ROI that companies
can realize by effectively using their data asset.
As a practicing Data Ninja, Fru uses his Ninja Skills to help companies make sense of
data; be it moving, storing, or analyzing data.
He has been battle tested with extensive experience of both consulting and working
fulltime with corporations across the globe, including several fortune 500 companies in
many different core industries: Retail, Banking, Health Services, and Food &
Fru is also the founder of NextGen Global (NGG), LLC a boutique solutions firm that
conducts research and provides consulting and advisory services to organizations large
and small.
NGGs mandate and number one value is to provide solid solutions, techniques and
services that enable companies to utilize all of their available assets (people, process
and tools), to run, grow and transform their businesses.