You are on page 1of 17

The Current State

1 of Hadoop in the



2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
of Hadoop in the 1

Executive Summary
Owning, acquiring, analyzing and managing data have suddenly moved from an operational task
required by IT to a top corporate priority where information is viewed as a strategic asset. As business
plans increasingly call for reliance on Big Data, users of all stripes are catching on and becoming more
proficient at making use of data through analytical tools.

Many organizations today unequivocally view data as strategic to their business operations and
growth, including those who havent quite yet fully figured out how best to extract value from data.
Hadoop has emerged as a popular technology for consideration, but knowing exactly where and how
Hadoop can be leveraged within the modern enterprise data architecture is an open question.

The attraction of the low-cost, high-availability storage and processing power of Hadoop has drawn
many organizations to give this new technology consideration, either by way of limited scope
evaluations and pilots or small deployments. And yet, Hadoop may not be a panacea for every Big
Data initiative. Even its most enthusiastic champions highlight some challenges that could be slowing
down Hadoops broader adoption.

Market sentiment about the possibilities of Hadoop is high and continues to grow. The total market
size is in the eye of the beholder, but combining the views of top market observers seems to indicate
that only 1,000-1,500 organizations globally are actually running Hadoop in production, and this
includes the early adopters whose entire businesses are based on data. Generally, Hadoop adoption is
modest to date, with most enterprises at knowledge collection, evaluation or piloting stages.

Key drivers for Hadoop adoption include low-cost data storage coupled with a distributed processing
environment thats ideal for experimentation with large, unstructured data sets that have not been
accessed by organizations in the past.

A key finding from this study is that many more organizations view Hadoop as playing a strategic role
in their future growth, but are still struggling to implement due to complexity, skills gaps, unfriendly
analytics interfaces for business users, and the decision to implement a commercial version or to
manage it themselves.

This report begins with a summary of the most recent research on Hadoop adoption rates, market
size and growth, and the most common use cases of Hadoop today. The second section presents key
experiences drawn from qualitative interviews conducted among organizations in various stages of
Hadoop deployment. The study concludes with a set of recommendations for organizations who are
considering Hadoop.

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
2 of Hadoop in the

1 Current State of the Hadoop Market

The tsunami of coverage and commentary from vendors, analysts, media and industry pundits on
Hadoop in recent years has been nothing short of staggering. While Hadoop itself has been around
for a decade, excitement around its potential has been growing in large measure due to the interest
in Big Data, the catch-all term for the data exhaust pouring out of the ever-expanding universe
of mobile devices, sensors and social services connected to the Web and talking to each other
(commonly called the Internet of Things).

Hadoop Adoption - Mirage or Real?

As with any emergent technology, adoption carries a range For organizations deciding
of interpretations when it comes to Hadoops footprint in the
whether to Hadoop or not to
enterprise. Late in 2014, Forrester Research declared bullishly
that Hadoop adoption and innovation is moving forward at Hadoop, the modest adoption
a fast pace, playing a critical role in todays data economy.1 rates are even more confusing
Around the same time, Gartner took a more conservative view when compared against
noting that Hadoop had a growing number of pilots, but no glowing reports of business
dramatic growth in substantial projects.2 One industry observer momentum from commercial
states, The actual installed base of Hadoop clusters remains
Hadoop vendors.
a lot smaller than many might expect given the amount of
innovation that is going on around the platform.3

For organizations deciding whether to Hadoop or not to Hadoop, this climate of uncertainty and the
modest adoption rates are even more confusing when compared against glowing reports of business
momentum from commercial Hadoop vendors. The three big commercial Hadoop distributors
(Hortonworks, Cloudera, and MapR Technologies) appear to be enjoying healthy growth and have
secured the backing of industry giants and the public markets.4

In the context of these mixed signals, determining where Hadoop sits on the hockey stick of adoption
can be a tricky exercise. There is no doubt that the interest in the possibilities presented by Big Data
among organizations is real. How many organizations are actively realizing these possibilities through
Hadoop is another question.

1 Gualtieri.
2 Adrian.
3 Morgan.
4 Intel made a whopping $740 million investment in Cloudera in March, 2014 and Hortonworks received a $50 million strategic investment from HP in
November, 2014 followed by an IPO in December, 2014. MapR closed a $110 million round of financing in June, 2014 led by Google Capital. These are
certainly bullish endorsements for the future of Hadoop despite sluggishness in uptake.

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
of Hadoop in the 3

Hadoop Market Size and Growth Potential

Despite the split opinions among Hadoop watchers in the industry, most agree on these measures of
market size and growth rates:

1. Number and size of implementations (proof-of-concept pilots as well as

production level)
2. Commercial subscription revenues

Number and Size of Implementations

Aggregate data from multiple sources on Hadoop pilots and Hadoop is seeing some good
proof-of-concept experimentations (Hadoop in a sandbox, tailwinds in the organization,
limited clusters/nodes) suggest that Hadoop is seeing some
positive tailwinds, which should be reflected in growing
which should reflect in growing
adoption numbers in coming years. adoption numbers in coming years.

But recent commentary from Gartner notes that 70 percent

of companies who have invested in big data have mostly
done so for pilots, with only 12 percent using big data in full production environments.5

Ovum analyst Tony Braer recently estimated an installed base of 1,500 to 2,000 clusters globally by
end of 2015. Braer notes that clusters with several thousand nodes remain the exception,6 with a large
number of organizations typically starting out their Hadoop implementation at a far more modest
scale. Most enterprises start out with dozens of server nodes and certainly well under 100 nodes
for proof of concept projects. Then, as they move into production, those Hadoop clusters grow to
hundreds of nodes as the datasets expand.7

Commercial Revenues
Subscription revenue generated by the three primary commercial Hadoop vendors (Cloudera,
Hortonworks, and MapR Technologies) is yet another metric used to measure current health and
future opportunity for Hadoop. 451 Research analyst Matt Aslett recently estimated about $374
million in Hadoop vendor subscription revenue in 2014, growing at a compound annual growth rate
of 49 percent through 2018. This would imply revenues for support and software reaching $2.7 billion
at the end of 2018, according to 451 Researchs growth model.8

5 Savvas.
6 The Internet market giants such as Google, Facebook, Yahoo, Twitter, Amazon and Netflix, whose businesses have data at their very core, are commonly cited
as the earliest adopters of Hadoop in production scenarios going back several years. These clusters sizes are exceptional and not representative of mainstream
adoption patterns.
7 Morgan.
8 Morgan.

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
4 of Hadoop in the

Geographic Differences
While interest is growing across the globe, at present North America leads in Hadoop adoption. A
2013 Sandhill Group survey9 showed that EMEA, Asia/Pacific and India lag significantly behind North
America in terms of Hadoop adoption.

In sum, while the base of Hadoop users is growing, only a small minority appear to be running
Hadoop in production at a reasonable scale. Many more appear to have downloaded a free version
without moving further. The caution and sluggishness in adopting Hadoop for production should
be interpreted less as a sign of lack of interest and more as an indicator of media hype and vendor
innovation having outpaced the readiness of organizations.

2015 will see that dynamic evolve with many declaring it to be the year that interest in Hadoop grows,
bringing Hadooponomics a bit closer to reality for businesses eager to move from exploration and
early pilots to production-level projects.

Drivers of Hadoop Adoption

The growing appetite to maximize insight and business
value out of untapped data stores is the most significant
Many organizations in the midst
strategic driver behind interest in Hadoop. More tactically, of formulating their broader big
it is scalable, flexible, low-cost data storage that is data strategies and architectures
cited as the most immediate benefit of Hadoop to the are attracted to Hadoop as a cost-
organization. effective holding container for
massive stores of unstructured data
Low-cost data storage
in an effort to leave no data behind.
According to Statistic Brain, the average cost per gigabyte
of storage has dropped from $437,500 in 1980, to $11 in
2000, to just $.05 (five cents) in 2013.10 While cheaper storage is here to stay, the growth in data has
offset that advantage.

Two recent end-user testimonials stand out:

TrueCar collects vast volumes of car price data to power their online car-buying
business. The move to Hadoop slashed their monthly storage costs from $19/
GB to $.23 cents/GB.11

Dell SecureWorks, an internet security software company, processes up to 20

billion events per day in real time. It was able to slash its monthly storage costs
from $17/GB to $0.21 cents/GB.12

9 Graham; Rangaswami.
10 Statistic Brain.
11 John Williams, head of Platform Operations at TrueCar was quoted as saying, Were looking at data that is just a mess, and in the past we would have our
staff spending a long time just cleaning that up. But with Hadoop, you can imagine just keeping every piece of data forever, because that way you can always
go back later and take a look and see what you come up with. Source: Hortonworks.
12 Source: Cloudera.

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
of Hadoop in the 5

Scalable Data Storage

The meteoric growth (volume) and speed (velocity) of
unstructured data being generated from the social and mobile
web has overwhelmed IT and business decision makers alike.
Your companys biggest database isnt your transaction, CRM,
ERP or other internal database. Rather its the Web itself and the
world of exogenous data now available from syndicated and
open data sources.13 By some estimates, 90 percent of all data
in our digital universe today is unstructured or semi-structured.14
Many organizations in the midst of formulating or broadening
their big data strategies and architectures are attracted to
Hadoop as a cost-effective holding container for massive stores
of unstructured data in an effort to leave no data behind.

Limits to Hadoop Adoption

Hadoop may not be a panacea for every big data initiative, and
even its most enthusiastic champions highlight some challenges
that are slowing down its broader adoption.

A survey of over 100 data scientists last year revealed that 76

percent of those who used Hadoop found that it takes too much
effort to program or has other limitations and is too slow for
real-time analytics.15

There is also a growing sentiment that Hadoops MapReduce

engine, which is optimized for batch processing, isnt designed to
handle ad-hoc, interactive real-time data discovery and analytics
a popular use scenario for big data analytics today.

Finally, organizations point to a Hadoop skills gap as an inhibitor to adoption.16 In response, the
industry is seeing acquisitions and alliances that address the pain point around Hadoop and analytics
talent with companies such as Teradata acquiring Hadoop consultancy Think Big Analytics17 and more
recently, a strategic alliance announced between Cloudera and Deloitte.18

The slow ramp up of Hadoop in the enterprise may be a blessing in disguise, enabling vendors of
commercial Hadoop distributions, as well as the open source community and app developers building
on top of Hadoop, to solve for gaps in the technology that could hasten its adoption in the enterprise.
13 Laney.
14 Gantz; Reinsel.
15 Russom.
16 This is true in both North America and Europe. A recent big data skills workshop hosted by the European Commission noted: Evidence already shows an
emerging shortage of analytical and managerial skills necessary to make the most of Big Data.
17 See Teradata Acquires Think Big Analytics to Accelerate Growth of its Hadoop and Big Data Consulting Capability, September 3, 2014.
18 See Cloudera and Deloitte Announce Strategic Alliance to Advance Analytic Performance of Customers, February 19, 2015.

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
6 of Hadoop in the

Hadoop vs. the Enterprise Data Warehouse - Friends or Frenemies?

The buildup of excitement and interest over the past few years regarding Hadoop has triggered some
headlines positioning it as a challenger to the incumbent enterprise data warehouse. And yet as
more organizations have experimented with and used Hadoop, theyre beginning to clarify the role of
Hadoop as an element of their broader data infrastructure.

TDWI recently arrived at the following conclusion after conducting a series of qualitative interviews
with data professionals earlier this year:

Few users are even contemplating a warehouse replacement. Instead, many are
actively migrating some of their warehouse (defined as data) to other platforms,
including Hadoop, as well as data warehouse appliances, columnar databases,
NoSQL databases, clouds, and event-processing tools. They do this to get platforms
better suited to advanced analytics with the migrated data (and other specialized
workloads). In fact, this movement toward multi-platform data warehouse
environments is one of the strongest trends in data architecture today.19

The future data processing and data management landscape will be a hybrid of EDWs and Hadoop,
with each used where appropriate for the individual downstream analytic and BI use cases. The EDW
is the best choice for structured and curated data. A Hadoop-based sandbox is the best choice when
experimenting with a use case involving new types of data (web logs, text, email and machine data),
which may not be well-qualified.

Depending on the use case, some organizations will find themselves combining data from both
environments. Online product recommendations is a common example best met with this hybrid
approach as it combines consumer sentiment data (free text reviews) with structured data (pricing,
SKU numbers, product descriptions). In the future, data warehouses will evolve to accommodate
storage economics, new business use cases, data governance, latency, scalability, and diverse data
structure requirements.

19 Russom.

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
of Hadoop in the 7

2 Hadoop Realities Experiences from the Field

With the overview of the current market for Hadoop as a backdrop, this section presents actual
experiences of end-user organizations at various stages of Hadoop deployment, drawn from a set of
extensive qualitative interviews. Some of these realities confirm the market perceptions described in
the first section, but a few key findings contradict the market sentiment.

Reality 1: Hadoop is viewed as a key component of data-driven

Respondent organizations primary reasons for using Hadoop are authentically strategic in nature,
born of a specific need to improve their companys ability to solve complex problems through
advanced analytics applied to large untapped data sets. Making the decision to implement a Hadoop-
based architecture is therefore a well-thought out component of a larger analytics strategy that is
aimed at competing more effectively through greater share and/or revenue, or lower operating costs.

Organizations that rely upon qualified data to support their business decisions and have embraced
advanced analytics are therefore best suited to utilizing Hadoop. With an ability to perform more
nimble analyses across a large volume of data, Hadoop can help analytics teams bring to light new
insights based on data relationships, trends, and new types of information not previously understood.

[The primary goal for our Hadoop deployment is] to create common centers
of excellence around analytics, plus reduce duplication, reduce errors, heighten
data quality, and improve the quality of the insights we obtain.
- Karl Moad, University of Pittsburgh Medical Center

Marketing analytics represents a common use case for Hadoop. Organizations seeking to improve
customer service/customer retention and to attract and acquire new customers are finding Hadoop to
be especially beneficial. By allowing their analysts to access all customer touch points throughout an
organization - and make use of data points that havent previously been accessible - these companies
look forward to gaining a competitive advantage. At a minimum, respondents view Hadoop as a tool
that at least keeps them on par with competitors capabilities. Not deploying Hadoop carries the risk
of falling behind.

We have a tremendous amount of data and were trying to glean more cus-
tomer-related information in order to improve sales and marketing. That has
really been the key driver outside of IT.
- Anonymous participant, health insurance company

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
8 of Hadoop in the

Other industry-specific goals for Hadoop implementations raised in the interviews relate to areas such

Simplified supply chain management across multiple manufacturing/assembly


Improved ability to track and assess performance management (e.g., human/

talent in consulting organizations; machine data/performance and human
performance in manufacturing)

Enhanced ability to perform risk analysis for insurance underwriting

While these goals pertain to specific industry sectors, the common thread is that these organizations
see Hadoop as offering a unique solution to help them improve a process that is central to their
organizations health. For example, a consulting services organization struggling to manage
turnover among its professional workforce seeks to implement predictive analyses atop the Hadoop
infrastructure to identify key areas of burnout. When employees choose to leave, the organization
wants to look at the patterns of behavior that could become indicators of flight risk and reshape their
human resources strategy accordingly.

Reality 2: Hadoop requires re-

thinking the organizational data
As previously discussed, companies have started
making adjustments to their data management
processes by offloading existing data storage
and processing from operational systems or data
warehouses to Hadoop, and using Hadoop as a
reservoir for storing and processing new data --
particularly unstructured or semi-structured data.

Our study participants confirmed that although

Hadoop will be an integral part of the enterprise data
architecture, it is typically not seen as a replacement to
the enterprise data warehouse. It is a complementary
system that is expected to co-exist for a specific
purpose, at least for the near term. Participants
articulated intentions to use Hadoop as a centralized
data hub for downstream BI and analytics usage, but
this hub is fed though existing operational systems
and data warehouses. Longer-term, however,
customers may eventually replace their relational
database systems.

Use of data warehousing will continue as needed for

BI and analytics, since it contains relevant and curated

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
of Hadoop in the 9

sets of structured data. However as organizations face increasing requirements from business units to
also analyze different types of data, they will invest in using Hadoop in a sandbox environment and
pursue steps to prepare this data and apply analytical techniques.

Through the years, the different business units have been quite siloed in run-
ning the business now we are trying to operate strategically, more as an
ecosystem. There is a need for us to be able to have
full visibility across the offerings.
- Christina Foo, Intuit

It can be unwieldy for large organizations to get all business units to snap to a particular type of data
architecture that will allow the corporation to sift through, utilize, and apply learnings across business
units. With a strong corporate priority placed on data management and analytics, these organizations
are looking for ways to effectively access and manage all of these disparate data inputs accordingly.

Like any large enterprise, we have a diverse set of upstream systems that
generate data and we continue to work towards integrating them quickly. In
the meantime, however, healthcare reforms like the Affordable Care Act neces-
sitate taking a longitudinal and cross-sectional view of our
members and that is an area where Hadoop can help.
- Ravi Shanbhag, UnitedHealthcare

Reality 3: Hadoop value primarily seen in new analyses on

unstructured data
Not only does Hadoop allow our respondents organizations to manage and analyze data across a
variety of non-congruous inputs, Hadoop is welcomed as the vehicle that will allow companies to
analyze unstructured data to garner incremental benefits and further support the strategic efforts
around analytics. The amount of data currently managed within the Hadoop cluster varies drastically
depending on where in the adoption cycle a particular organization is. But with the exploding growth
of data coming from all sources, these organizations are expecting Hadoop will hold a significant
portion of their longitudinal data as well as all unstructured data.

Healthcare organizations in particular are looking for ways to incorporate a mix of data that typically
sits outside of the traditional databases: physician notes, lab notes, procedural documentation,
images, and more, into one inclusive system to feed their analytics strategy.

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
10 of Hadoop in the

80% of our data is unstructured, and only 20% is really structured [we have]
20 plus years of collecting all of our clinical data, all of our physician notes, all
of our physician procedural documentation, all of that as well
as lab notes and everything else.
- Karl Moad, University of Pittsburgh Medical Center

While Hadoop is certainly being utilized to capture and store unstructured data, companies must next
apply text analytics techniques in order to parse, categorize, and examine sentiments. For example,
a major healthcare system can now effectively store a patients medication history, treatments, and
doctors notes (stating qualitative aspects of a patients health), and then apply text analytics to
the patient notes which can then be folded into advanced models to perform predictive alerts for

Of course, healthcare isnt the only industry that benefits from systems that allow access to
unstructured data. Any organization that interacts with and markets directly to consumers can
now store and manage textual data referencing their company or brand from social media, online
communities and call centers. Companies that develop systems to capture and then effectively
analyze what is being said about their brand can get a continuous pulse on consumer sentiment, and
develop predictive techniques to help guide their PR and advertising strategy, as well as provide alerts
when their social media presence needs to be managed.

Another very common, nearly ubiquitous use case is the incorporation of call center notes into
corporate data storage systems, which can be combined with traditional CRM systems to help
companies manage customer complaints and the hidden challenges customers may be experiencing
with sales, service, or use. A really good analytics model built upon this system could make use of this
information to provide additional context and insight into the relationships the company has with its
customers. Ultimately, companies may be better equipped to identify customers who are at risk well
before they decide to leave or discontinue using a product or service.

A final example of a company that is launching a Hadoop initiative in order to improve their ability
to store and analyze a wide variety of inputs is a U.K.-based security software company that IIA
interviewed for this study. The company provides a service to its clients to monitor and provide data
on all security endpoints within an organization; yet they cant effectively store and analyze these data
using existing relational technologies in a timely manner. Their customers have thousands of systems
collecting data points on employee communications and transmissions to prevent data leakage
or other security failures. Security breaches are always a concern, but with highly publicized hacks
such as those that took place recently with Target and Sony, their clients need to more effectively
recognize behavioral patterns that foreshadow data leakage or other security breaches across an
increasingly vast and diverse set of data points.

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
of Hadoop in the 11

At the moment were running into bottlenecks on our bigger customers who
want to log everything forever with detail where the relational database just
isnt responsive. So the initial goal is to improve that, to make it more useful for
the customer and allow them to collect more data, more endpoints.
- IT executive, endpoint security software company

A possible use scenario for them would be

What Hadoop is NOT used for
to feed the data collected into a datastore
for detailed inquiries, as well as providing a While Hadoop can offload serious data-intensive
means for blending together unstructured crunching for large-scale models, cutting the
and structured data for overall utility reports. processing time from days to hours, it is not
In the event of a malware infection, they well-suited to real-time data processing with a
could imagine overnight jobs running relatively small number of records. Traditional
against it, digging into that with more depth,
relational databases are still viewed as the
allowing them to include far more data
points (and types of) than they can handle primary and best means for these types of queries.

If you just want to return a result very quickly, Hadoop is actually not
optimized for that. So for instance, if a prospective customer is visiting our
website and we want to know what products they are likely to purchase that
is what we consider to be real-time processing. Where one would use Ha-
doop-based analysis, is for example when building that model to predict what
products the customer would purchase. There we need to go back and forth
and look at all of the values from previous site visits over a period
of several years, across 10,000 customers.
- Adam McElhinney

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
12 of Hadoop in the

Reality 4: Satisfactory business user access and experience is

still in process
To date, key Hadoop benefits are focused around the management and processing of data, i.e.,
providing greater flexibility and cost savings in data storage; faster processing of increasingly larger
volumes of data; the ability to extend traditional data warehouses; and providing methods for
archiving and querying data for longer periods of time to allow greater longitudinal analyses.

Accordingly, within the organization, the starting point for interest in Hadoop is the corporate
IT department, which holds primary responsibility for provisioning and maintaining the Hadoop
environment for data storage prior to analysis (whether using a commercial solution or a free

Hadoop is thus delivering on the IT goal to provide a cost effective and scalable solution for storing
older data that is less frequently accessed.

Lets say we have ten years worth of claim data, clearly we do not need all the
ten years of data all the time. Not all the data is as heavily utilized. The goal
is to make sure that the highly utilized data stays in the costliest applications
and gets used the most because you want to get the most bang for your buck.
These traditional data warehouse appliances can be effective but expensive.
You do not want to store stale data in there which does not get used.
- Ravi Shanbhag, UnitedHealthcare

While IT and Hadoop power users are relatively satisfied with how their Hadoop systems are
delivering on these goals, our study respondents said satisfaction is likely to be lower among business
analysts and data analysts, most of whom have yet to leverage self-service data management and
data exploration tools that can help them to move quickly and aggressively interact with the data
stored in Hadoop. When business decision-makers can easily pull out insights that will guide their
decisions, companies will start to realize the potential and achieve the goals they set out to obtain by
investing in Hadoop.

Currently, data management and analytics projects involving Hadoop are limited to those that only
a highly skilled team of engineers/scientists/developers can perform. This creates a bottleneck in
getting projects completed. Hadoop also presents an HR challenge to find and employ the right
people. In fact, some respondents indicate that even recent college grads arent necessarily trained to
work in the Hadoop world yet. Traversing from the structured world of columns and rows is a difficult
transition for many analysts to make, who are deeply rooted in the relational database world.

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
of Hadoop in the 13

Further delaying the value from Hadoop is the fact that tools that can provide common high-level
language support for data management and analytics professionals and/or self-service tools for non-
technical users are slowly evolving and being made available within the marketplace. Most study
participants said that Hadoop will reach greater adoption throughout their organizations when there
are enterprise-class tools and simple end-user interfaces.

Theres just a whole set of work that people have to do thats in the exploration
side of the house and the vendors really havent provided us a good method for
doing that for the masses. Its been more of an engineer type of a mentality. So
that has to evolve in order to get more use out of it.
- Analytics director, entertainment industry

In general, satisfaction with Hadoop implementations is reasonably high, but there is room for
improvement. While the initial set-up is viewed as being relatively easy, people find that the open
source structure contains some gaps that require workarounds.

I would say I am fairly satisfied. I have found

some things to be more difficult than I thought were necessary.
I cannot responsibly go to my management and say, Hey, lets just stand up
the open source variation of Hadoop and go wild with it without some type of
support and management structure behind it.
- Karl Moad, University of Pittsburgh Medical Center

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
14 of Hadoop in the

3 Recommendations
Taken together, the actual experiences of Hadoop users today temper the fervor of the various
Hadoop related market segments. Hadoop will undoubtedly play a central role in the data and
analytics architectures of the future, but can also carry with it expense, rapid change and frustration
in the near-term. As the Hadoop ecosystem continues to develop, reality will come into line with the
promise. Until then, we conclude with a set of end-user recommendations:

Identify and define use cases that deliver competitive advantage and are strategic in nature.
The majority of end-users interviewed confirmed that while Hadoop is a new technology, its strategic
value is also understood among senior leaders. Applying Hadoop to high-profile, valuable use cases
that rely on leveraging new data types can quickly rationalize the costs of deployment.

Evaluate whether and how Hadoop fits into your existing data and analytics architecture. As
has been noted, the data storage cost advantage of Hadoop can cause some to confuse it as a data
warehouse replacement. Successful end-user organizations should carefully plan on the role Hadoop
will play within the existing data architecture. For some less analytically mature organizations, it may
be too early to actually be useful.

Augment Hadoop with data management, data discovery and analytics to deliver value. For a
Hadoop deployment to be worth the effort, business analysts will eventually need to access Hadoop
to do their own data analyses. While the deployment itself is critical, remember that success will be
evaluated in the eyes of the ultimate consumers of the insights driven from Hadoop data.

Reevaluate your data integration and data governance needs. Use of Hadoop as a data reservoir
or as a data hub does not eliminate the need for data integration and governance as part of your
modern data architecture. It is important to evaluate your current and future data integration
requirements (e.g. acquire, clean, refine, aggregate, federate, etc.) to address variety of business
problems and how will it comply with data governance requirements.

Assess skills/talent gaps early and develop a plan to mitigate those gaps before deployment.
Among the hurdles experienced by end-user organizations, most pointed to being surprised about
the level of skill needed to fully run Hadoop in production. High-performers are assessing the skills
necessary before embarking and developing a plan to fill those gaps before setting overly-high
expectations with their organizations for project delivery.

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
of Hadoop in the 15

Adrian, Merv. Hadoop Deployments: Slow to Grow So Far. December 5, 2014. Blog.

Boulton, Clint. Hadoop Analytics Is Finding Favor With More CIOs, Deutsche Bank Says. Blog. January
11, 2015.

Gantz, John; Reinsel, David. Extracting Value from Caos. Research Report. June, 2011. http://www.

Gantz, John; Reinsel, David. The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest
Growth in the Far East. Research Report. December, 2012.

Graham, Bradley; Rangaswami, M.R. Do You Hadoop? A Survey of Big Data Practitioners. Research
Report. October 29, 2013.

Gualtieri, Mike. Forresters Hadoop Predictions 2015. November 4, 2015. Blog. http://blogs.forrester.

Kelly, Jeff; Floyer, David; Finos, Ralph. Wikibon Big Data Analytics Adoption Survey, 2014-15. Wikibon.
October 15, 2014. Online.,_2014

Laney, Doug. Gartner Predicts Three Big Data Trends for Business Intelligence. Blog. February 14, 2015.

Morgan, Timothy Prickett. Hadoop Finds Its Place In The Enterprise. EnterpriseTech Software Edition.
October 29, 2014. Blog.

Russom, Phillip. Can Hadoop Replace a Data Warehouse? Blog. January 27, 2015.

Savvas, Antony. 70 percent of companies who have invested in big data have mostly done so for pilots,
with only 12 percent using big data in full production environments. Online. October 21, 2014. http://

Snaplogic. Enterprise IT Uncertainty Around Big Data Initiatives in 2015. February 17, 2015. Infographic.

Statistic Brain. Average Cost of Hard Drive Storage. Data Table. November 11, 2014. http://www.

2015 IIA and SAS Institute Inc. All Rights Reserved.

The Current State
of Hadoop in the 16

about iia
The International Institute for Analytics (IIA) is the authority on analytics maturity and best practices
and provides the advisory and support for organizations to leverage the power of analytics to drive
business results. IIA encompasses a network of analytics experts committed to knowing and sharing
the keys to success in an economy increasingly driven by data.

IIA guides mission driven organizations as they build and grow their analytics programs. With an in-
depth research library, phone-based and in-person events, and custom training and advisory services,
IIA is an extension to business leaders and implementers to provide the strategic guidance required to
be an analytical competitor.

about SAS
SAS is the leader in business analytics software and services, and the largest independent vendor
in the business intelligence market. Customer analytics solutions from SAS offer the processes and
technologies that allow marketers to plan, coordinate and evaluate the success of their marketing
initiatives. By putting data in the hands of business users, marketing programs become more effective
and the organization becomes more efficient in execution. Deeper insights from data gathered help
organizations to become customer-centric by understanding their customers better and improving
customer loyalty. Since 1976 SAS has been giving customers around the world THE POWER TO

Additional Information
To learn more about this topic, please visit SAS Solutions for Hadoop at

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA
and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.

2015 IIA and SAS Institute Inc. All Rights Reserved.