Data Isn't Everything: Understanding Limitations

Data Isn't Everything
 Posted by Vincent Granville on November 20, 2016 at 5:30pm

 View Blog-- https://www.datasciencecentral.com/profiles/blogs/data-isn-t-
everything
Guest blog by Dr. Prashanth Southekal. Prashanth is an experienced technology professional
who understands what it takes to run efficient technology based solutions, processes, and
organizations. He brings over 20 years of experience in Information Management from
companies such as SAP AG, Accenture, Deloitte, P&G, and General Electric. Prashanth has
published 2 books on Information Management and he is currently working on his 3rd book
- "Data for Business Performance".
The current wave of excitement about data and data related technologies might lead one to
think that data is panache for poor organizational performance. Despite all the attention on
data including millions of dollars spent on data management, business intelligence, and
analytics projects, many organizations still struggle to gain value from the investment.
According to a survey by the Economist, 73% of respondents said they trust their intuition
over data when it comes to decision-making. Many enterprises believe having a Chief Data
Officer (CDO) in their organization makes then a data driven enterprise. But a recent Gartner
report is that only 50% of CDOs are successful in their posts.
While data definitely has the potential to improve organizational performance, it has some
limitations too. Hence, it would be prudent to know some of the limitations of data or rather
situations where even quality data doesn’t add much value to the enterprise.
 Data is normally obscured and can be biased
Most data that is analyzed in enterprises is structured data that is stored in databases. The
data that is stored in these databases is transformed from the unstructured natural format
into a structured format after the raw data is gathered, curated, and finally stored. This
structured format is driven either by the application (including the database) or by
individual’s predispositions and experience. For example in activity based costing (ABC)
analysis, if the application (and the database) can only capture only the start and end time,
but not the actual effort of the activity, then reporting and analytics on the activity effort
would never be possible. So the data context is either pre-determined or distorted. This
means “raw data” that is captured, curated and stored is not only obscure, but can also be
biased. According to leading statistician Nate Silver, "There is no such thing as unbiased
data. Bias is the natural state of all data".
 Data doesn’t always translate into actions and results
Even if the data quality is good and unbiased, translating data into insights, strategies,
decisions, and actions depends on the organization structure, proper training, empowerment
of staff to take actions, among other aspects. While data relies on logic, decisions are often
made based on emotion. This means pure logic alone can never drive actions from insights.
The image in this post is a real example from one of the main streets in Calgary, AB,
Canada, where gasoline price per liter at Shell (at 87.9c) is about 6 cents less than the
gasoline price in Esso (at 93.9c). While the competitor’s data for the attendant in the Esso
gas station is right in front of him, he is not able to change the gasoline prices due to the
approval he needs from his manager. Nothing is more frustrating than having the most
timely and accurate data and insights, but still not able to take any action. No value is
created by data and insights if they are not acted upon. If the insights are not put into
action, then analytics is not providing any value. According to Thomas Edison – “the value of
an idea lies in the using of it”. Benjamin Franklin said- “never confuse motion with action”.
 Relevancy of data is invariably a function of time and space
Quality data today may not be relevant at a future time in a different space or jurisdiction
primarily due to changing business needs and government regulations. For example say
within an enterprise shipment in plant A could be based on delivery priority while shipment
in plant B could be based on customer type. So for plant B, delivery priority data is irrelevant
or unnecessary. However most often the relevancy of data is misunderstood and many
organizations spend lot of time and effort in managing data that is unnecessary. This is a
perennial problem in information management and researchers Martha Feldman and James
March reported way back in 1981 that managers often ask for data and information that
they don’t use.
 Data has the potential to cause analysis-paralysis
Presently we have capabilities to generate, capture, and process huge amounts of data.
According to Eric Schmidt, Chairman of Google, every two days we create as much data as
we did from the dawn of civilization up until 2003. According to IBM, every day, we create
2.5 quintillion bytes of data (1 quintillion = 1 followed by 18 zeros). This situation will result
in more challenges in getting quality data and ultimately deriving meaningful information.
According to William McKnight, author of Strategies for Gaining a Competitive Advantage
with Data, “It’s not just spitting out information for the sake of it, it’s actually trying to
connect the dots between previous transactions, current transactions, and potential future
transactions.” In a survey by Oracle, over 300 C-level executives said their organization is
collecting and managing 85% more business data today than it was two years ago. However
47% of them said their organization cannot interpret and translate the information into
actionable insights. While data is important, it is the right data that matters.
Not everything that can be counted counts, and not everything that counts can be counted.
-- Albert Einstein, Physicist
 Stakeholder’s perceptions typically precede metadata ontology
A data entity might be consumed in different ways by different stakeholders. For example,
while the telephone number field might be used by the sales agent to make customer calls,
the tax analyst might potentially use the area code within the telephone number to get the
tax rates as per jurisdictions. This means the actual use of the telephone number field is
more than its intended use thereby making metadata ontology challenging. In addition, in
most cases the boundaries between data and information are not always clear. What is data
to one person might be information to someone else and vice versa. To a crude oil
commodities trader for example, slight changes in the sea of values coming from the
exchanges might act as information for taking appropriate action. But to anyone else, they
would look like raw meaningless data.
 Data management is expensive and time consuming
While businesses strive for quality data to derive insights, getting and managing quality data
is expensive. Data is created, stored, processed, shared, aggregated, cleansed, replicated
(to DR sites) and archived and all these activities take time and money. According to the
research done by Dr. Howard Rubin of MIT, 92% of the cost of running business in the
financial services sector is related to data. Once the data quality is improved, the data
quality must be governed in the entire data lifecycle as it is estimated that data quality
degrades at 7% per annum. So if organizations need quality data, then the data
management initiative should be seen as a continuous improvement initiative at enterprise
level (not at LoB or function level) and not as a one-off project. Data management is a
marathon, not a sprint.
 Data might distort innovation
Data sheds light on the past events which no one has any control to change. Seth Kahan,
author of “Getting Change Right”, uses the analogy of driving a car in data driven decision
making. He says, “Making decisions just on data is like driving your car only by looking in the
rear view mirror. During tough times, leaders tend to depend upon the past to make their
decisions as they want to be certain about what they are doing. The more certainty an
organization wants, the more they go backwards. But the past only shows where you’ve
been, not where you are going or should be going”. According to Lara Lee and Daniel Sobol
of Harvard Business School “Data reveals what people do, but not why they do it.
Understanding the why is critical to innovation”.
 Data is never real-time
Though many companies talk about performing perform real-time analytics on data, data is
never real time. The term real-time analytics is an oxymoron. Why? There is always a time
lag between data origination and capture. This time lag can be a few microseconds in
plant/SCADA/PoS systems or it might be even months before the data is formatted,
cleansed, validated, curated, and committed in databases of IT/OLTP systems. On top of this,
data is consolidated from diverse systems and aggregated before analytics operations are
performed on the BI/OLAP data set. So the time lag is further extended – from microseconds
to months or even years. Even though aggregated data analytics is quite different from
analytics on streaming-data i.e. data originating from Social media, IoT and sensors' data,
there is still a time lag between data origination and analysis in both cases. With aggregated
data the time lag might be days/weeks/months before you can analyze while with streaming-
data it could be minutes/hours. Inherently data is "historical" (and unstructured) and the
term “real-time analytics” is an oxymoron and doesn’t exist. Finally, even if you manage
to get data in real-time, the analysis will be on a single record. To
perform trend/prescriptive/predictive analytics, you need significant data records and this
means, meaningful data analytics can never be real time.
 Data has no relevance for first time events
In today’s uncertain and volatile business situations, businesses are forced to take many
actions for the first time. So naturally there will be no data available for these first time
events. For example, a company in US might try to enter a new market in Asia, say India.
There is no data available for this company if its products or services will work in the Indian
market. For example, Wal-Mart did a thorough analysis of the South Korean market before it
entered the country in 2006. While Wal-Mart marketed items like electronics, the South
Koreans prefer to spend their money on food and drinks. Another example of lack of data for
first time events is when an enterprise decides to outsource its IT services. An IT service
provider might have a great track record of delivering services from India. Analyzing the
data sets of the engagement the IT service provider has with other enterprises to take a
decision to outsource does not help much your enterprise because the service model for the
business enterprises is very contextual and unique. Basically there is no reliable data for
first-time business ventures and business basically have to rely on intuition, consultation
with people familiar with the environment, and computer "what-if" simulations.
 Data can Mislead Decision Making
Data can be used to mislead decision making in 3 main ways: KPIs, Graphs and Sample size.
Firstly incomplete KPIs are a common source of getting misled. Analytics invariably works
effectively at an enterprise level. This means, analytics in the context of a business
enterprise need to be implemented with a core set of enterprise wide LoB dependent KPIs.
This is important given that LoBs within an enterprise often have conflicting goals and any
KPIs using the data at the LoB level might provide a distorted picture of the performance of
the enterprise. For example, the marketing LoB might present a KPI that shows percentage
increase in customer loyalty. While this KPI might be a positive indicator of performance for
the marketing LoB, due to the increased campaign costs, the financial KPIs will definitely
have an adverse impact. So data in KPIs might not give a complete and accurate picture of
the performance of the enterprise. Basically data and KPIs can be used mislead the
organization, if the enterprise doesn’t have a core set of KPIs that are LoB dependent.
The second place of being misled by data comes from graphs. Graphs are sometimes
deliberately used to mislead due to various reasons in the company. This may happen
because the designer chooses to give readers the impression of better performance or
results than is actually the situation. In other cases, the reader may be misled by a poor
choice of chart selection where the graphs can be manipulated with scales, axis,
infographics, and so on. The third common source of misleading comes from the data source
or sample size selection. "Tell me what you want to hear, and I will provide data
appropriately" is a common joke in business transformation projects. So data and statistics is
often used to twist a argument in its favor. A 2009 investigative survey at the University of
Edinburgh, UK found that 33.7 percent of scientists surveyed admitted to questionable
research practices, including modifying results to improve outcomes, subjective data
interpretation, withholding analytical details and dropping observations because of gut
feelings. There are also numerous cases where data is manipulated for business gains. For
example, in 2007, Colgate was ordered by the Advertising Standards Authority (ASA) of the
UK to abandon their claim: “More than 80 percent of Dentists recommend Colgate". The
claim, which was based on surveys of dentists and hygienists carried out by the
manufacturer, was found to be misrepresentative as it allowed the participants to select one
or more toothpaste brands. The ASA stated that the claim “… would be understood by
readers to mean that 80 percent of dentists recommend Colgate over and above other
brands, and the remaining 20 percent would recommend different brands.” But upon further
analysis ASA understood that another competitor’s brand was recommended almost as
much as the Colgate brand by the dentists surveyed. Finally, ASA concluded that the claim
was misleadingly as it implied 80 percent of dentists recommend Colgate toothpaste in
preference to all other brands. The ASA also claimed that the scripts used for the survey
informed the participants that the research was being performed by an independent
research company, which was inherently false.
So What is the Advice?
In today’s interconnected world, nobody makes decisions in vacuum. So data very much
matters to business and data is the fuel in which today's enterprises run. But as explained
above, there are also cases when investing time and effort in building a data driven
enterprise might be a futile effort or even harmful. So when data management initiatives are
pursued, the effort could be more ineffective when:
 There is no senior management commitment on valuing data as a shared enterprise

asset (not at LoB or function level);
 There is no enterprise wide vision for running and sustaining a data management
initiative,
 The insights from data cannot be translated into actions quickly,
 The relevancy of data changes constantly depending on time, space, and stakeholder
preferences
 There is a need for unbiased details of the business processes and activities,
Thank you very much for your time. As always, I am sharing my thoughts to learn from my
network. Let me know your thoughts and feel free to share this article in your network if you
deem fit.
Regards!
Prashanth Southekal, PhD, PMP

Data Isn't Everything: Understanding Limitations

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Isn't Everything: Understanding Limitations

Uploaded by

Copyright:

Available Formats

Data Isn't Everything

 Posted by Vincent Granville on November 20, 2016 at 5:30pm

 Data is normally obscured and can be biased

 Data doesn’t always translate into actions and results

 Relevancy of data is invariably a function of time and space

 Data has the potential to cause analysis-paralysis

 Data management is expensive and time consuming

 Data might distort innovation

 Data is never real-time

 Data has no relevance for first time events

 Data can Mislead Decision Making

So What is the Advice?

 There is no senior management commitment on valuing data as a shared enterprise

You might also like