You are on page 1of 24

SAP HANA とビッグデータの連携検証ホワイトペーパー:Hadoop と SAP Vora の統合

NEC のプラットフォームシステム提供ノウハウとヴピコのシステム統合およびビジネスインサイト・ノ

NEC Reference Architecture


ウハウは最良の組み合わせ

for SAP HANA & Hadoop


Using NEC High-Performance Appliance for SAP HANA and NEC Data Platform
for Hadoop

NEC Reference Architecture for SAP HANA & Hadoop 1


Table of Contents

Executive Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Section 1: Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Section 2: Solution Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5


2.1 NEC Appliance for SAP HANA: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1. Turnkey appliance with NEC's SAP HANA certified Expresss5800/A2040d
server:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2. Equipped with Intel® Xeon Processor E7:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3. Fault-management functions using EXPRESSCOPE® Engine SP3:. . . . . . . . . . . . 6
2.1.4. Enhanced reliability, availability, and service (RAS) for SAP HANA delivered
through NEC- Red Hat Enterprise System Collaboration: . . . . . . . . . . . . . . . . . . . 6
2.2. NEC Data Platform for Hadoop (DPH): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1. High Performance, Scalable and Hortonworks Certified Platform. . . . . . . . . . . . . 7
2.2.2. Reduce Total Cost of Ownership (TCO). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3. Platform & Data Management Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3. Vupico's Data Analysis Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Section 3: Benefits of Integrated Platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9


Section 4: SAP HANA & Hadoop Integrated Solution use case:. . . . . . . . . . . . . . . . . . . . . . . . 10
4.1. Use Case: Data warehouse Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2. Use Case: Business Intelligence and Analytics. . . . . . . . . . . . . . . . . . . . . . . . . . 11

Section 5: Platform Integration for Analytics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12


• Advantage of NEC Data Platform for Hadoop integration with SAP HANA . . . . . . . 12
• Unprecedented Scalability: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
• Common Data Lake Platform:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
• Lower TCO:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.1. Proof of Concept: Intelligent Analytics across SAP HANA and Hadoop using
SAP Vora and Spark by NEC and Vupico . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.1.1. Business Use Case: Reduce lost opportunities by rapid and accurate evaluation of credit
score. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2. POC Platform Configuration - Hardware & Software. . . . . . . . . . . . . . . . . . . . . 14
5.2.1. POC System Solution Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3. Analytical Model (Use case Implementation) and Analytical Model Results . . . . . . 17
5.3.1. Use Case Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3.2. Use Case Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3.3. SAP Vora Interfaces and Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Section 6: Product Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21


6.1. HDP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2. SAP HANA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.3. SAP Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.4. Tableau. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2 NEC Reference Architecture for SAP HANA & Hadoop


Executive Summary
One of the biggest challenge organizations are facing these days are to manage the large volume
of data which is generated by the different enterprise operation and they need a system which
is more agile and capable of faster and scalable analytics by collecting the data from multiple
sources and types, be it structured or unstructured.

Every organization has their own operational challenges but most of them have common business
drivers like improve operational efficiency, customer retention & satisfaction and better product
better quality to gain competitive advantage. Additional challenges could be to simplify the
complex data management process, reduce the cost, platform consolidation and intelligent data
placement for better analytics.

Organizations needs platform and tools which can bridge the gap between business critical data
and huge volume of data coming from new sources. SAP® Vora™ has emerged as one of the
technology which provides distributed computing solution for business that leverage Apache
Spark distribution framework to provide enriched interactive analytics on Hadoop platform.
SAP Vora is in-memory query engine which allows organizations to use SQL as query engine to
analyze large volume of data from enterprise application, data warehouse, Hadoop Data Lake and
real time streaming data from IoT devices.

This whitepaper describes integration of NEC's Big Data Platform called "Data Platform for
Hadoop (hereinafter DPH)" with NEC SAP HANA® appliance and Analytics from Vupico to
solve the challenges of customers credit loan scoring in real time. For this use case, Vupico has
designed and developed end to end solution to implement data pipeline that will shorten the
time between loan request submission to validation from days to minutes which helps financial
institution to decide the credit worthiness of customer in real-time. Some of the important topics
covered in this whitepaper are:

•• Benefit of Integration between SAP HANA and Hadoop Platform


•• Key use cases to solve using NEC's integrated platform
•• Intelligent Analytics across SAP HANA and Hadoop using SAP Vora and Spark
by NEC and Vupico

NEC Reference Architecture for SAP HANA & Hadoop 3


Section 1: Introduction
Due to the emergence and rapid increase of new types of data in recent years, companies have
been forced to re-evaluate their data strategy and embrace a profound digital transformation
journey by utilizing Big Data Platform and Technology, IoT and Artificial intelligence.

In today's world, almost 80% of the data generated and stored by enterprises are unstructured and
it remains unanalyzed due to lack of right platform, tool and resources who can quickly identify
potential value from such data. To become competitive, it's important for any organization to
link business data derived from traditional systems with huge data from new sources and get real
time insights that result in better business outcomes. This requires a new approach that combines
and correlates structured data with unstructured data obtained from new devices, social media or
sensors in a cost effective and timely manner.

Enterprises using SAP products, like ERP and CRM, have been trying to identify ways to lower
the total cost of ownership and this pursuit can partly be addressed by deploying SAP HANA as
transactional and analytical system to store and process data. However, the growth and importance
of unstructured data to deliver in depth business intelligence has limited the relevance of SAP
HANA because high cost of data storage and data management makes SAP HANA system very
expensive when the volume of data increases significantly.

With the evolution of modern data architecture and framework, organizations have been looking
for open systems which can run on commodity hardware and can scale flexibly as demand grows.
This created the need for flexible and modular infrastructure requirement that provides clients
with a cost effective platform with easy expansion capabilities.

As a result, the industry has witnesses a growth in demand for Hadoop/Spark based platform
that allows distributed processing of large data sets across clusters of computers in real-time
and such platforms have been adopted by many enterprises as analytics platform for big data.
Hadoop is designed to scale up from single server to thousands of machines, each offering local
computation and storage, capable of storing and processing petabytes of data in any format and
helps organizations to ingest, store, process and visualize data using a common platform.

To address above challenges of storing and analyzing data in cost effective manner, NEC with its
partner Vupico has introduced use case based Reference Architecture that combines SAP HANA
and Hadoop with SAP Vora and integrates it seamlessly into the existing enterprise data and big
data environment. NEC's large-scale distributed processing platform named as "Data Platform for

4 NEC Reference Architecture for SAP HANA & Hadoop


Hadoop" (DPH) is integrated with Hortonworks Data Platform (HDP®) and NEC SAP HANA
appliance and it supplements the SAP HANA capabilities by SAP Vora integration.
NEC and Vupico have joined hands together to create an end-to-end integrated solution that
combines the power of Hadoop Platform & SAP HANA and helps in enhancing the capabilities
of SAP HANA, while lowering the overall storage and processing cost.

In the integrated stack, DPH is used to lower the cost of data storage system and also offload
expensive ETL processes from SAP HANA. This leads to an increase in profitability as it frees
up the capacity of SAP HANA system which can instead be used for higher value analytical
workload. Vupico with the experience of building and implementing business intelligence, data
processing pipeline and advance analytics using Hadoop and SAP ecosystem has developed an
interactive analytics and data tiering process by using SAP Vora for effective scoring for loan
processing.

Section 2: Solution Overview


The NEC High-Performance Appliance for SAP HANA and the NEC large scale distributed
processing platform - DPH combined with Vupico analytical services helps to leverage Hadoop
alongside SAP solutions for analysis and processing of very large volumes of data from a multiple
number of varied, structured and unstructured sources. The solution overview describes SAP
HANA systems and its integration with Hadoop and related technologies such as Spark and SAP
Vora to find potential business insights locked inside all unstructured and underused information.

2.1 NEC Appliance for SAP HANA:


NEC offers end to end turnkey appliance with NEC's SAP HANA certified Expresss5800/A2040d
server for quick and easy deployment. The server features fast processing performance especially
designed for real-time analysis of big data and other applications. The NEC appliance for SAP
HANA incorporates the innovative in-memory computing technology of SAP and the dependable
hardware platform of NEC with host of other rich features to offer high performance, availability
and ease of management. Some of the important features of NEC SAP HANA appliance are:

2.1.1. Turnkey appliance with NEC's SAP HANA certified Expresss5800/A2040d server:
The appliance is designed using NEC Express 5800/A2040d scalable enterprise server. NEC
Express 5800/A2040d is a scale-up server designed with massive resource pool to support
compute intensive and memory-hungry applications in mission critical and virtualized
environments, supporting up to 4 processors with 96 cores (192 threads), 4TB of memory and
16 PCIe 3.1 slots.

NEC Reference Architecture for SAP HANA & Hadoop 5


2.1.2. Equipped with Intel® Xeon Processor E7:
NEC SAP HANA appliance with Intel Xeon Processor E7 v4 Family offers the highest level of
performance, availability, and scalability, making it an ideal platform for mission critical and
database applications.

2.1.3. Fault-management functions using EXPRESSCOPE® Engine SP3:


NEC Express 5800/A2040d, armed with the NEC EXPRESSSCOPE Engine SP3, a specially
designed baseboard management controller, provides extensive remote management capabilities.
NEC EXPRESSSCOPE's Built-In-Diagnostics (BID) can identify failure location based on core
granularity, allowing to perform in depth failure analysis as compared to regular IA servers.
Further, besides checking the health of CPU and memory at the start up, it also checks the path of
the input/output and this reduces the risks of failure after operation.

2.1.4. Enhanced reliability, availability, and service (RAS) for SAP HANA delivered
through NEC- Red Hat Enterprise System Collaboration:
Before the advent of in-memory systems, NEC worked collaboratively with Red Hat in the
development of enterprise systems that delivered dynamic processing and memory functionality.
This collaboration resulted in the ability to remove faulty components from operation, and
reallocate system resources without system outage through standardized system calls to Red Hat
Enterprise Linux. NEC Express 5800/A2040d offers RAS features required to support business
critical workload for enterprise computing and avert SAP HANA down time.

NEC SAP HANA offering is available not only through SAP certified appliances but also through
Tailored Datacenter Integration (TDI); that brings wider choices to SAP HANA customers in
leveraging their existing hardware components, which should be SAP HANA certified, for their
SAP HANA environment.

For a list of certified appliances from NEC for SAP HANA, refer to online documentation at:

http://www.nec.com/en/global/prod/hana/model/appliance.html?

2.2. NEC Data Platform for Hadoop (DPH):


NEC DPH is a large-scale distributed processing platform that combines structure and
unstructured data to realize batch and real-time processing on one common platform. DPH is a
pre-designed and pre-validated platform consisting of NEC world class hardware optimized for
big data workload, RHEL OS and Hortonworks Hadoop and supported with range of services like
platform integration, data management & analytics.

6 NEC Reference Architecture for SAP HANA & Hadoop


The solution is designed to analyze various forms of unstructured data such as text, images, audio
and video along with traditional, structured data sources through high-speed parallel processing
and extreme density, thus providing a complete solution for big data utilization.

◊◊NEC Data Platform for Hadoop

2.2.1. High Performance, Scalable and Hortonworks Certified Platform


NEC DPH is a modular infrastructure platform which helps organization to accelerate business
insights by rapid deployment, gain unprecedented scalability to manage the growth in volume,
variety or velocity of data and associated processes. The foundation of this platform is the NEC
Express5800 series server where master nodes are based on 1U rack server and worker node
are 2U storage rich nodes, which allow scalability for both compute and storage together. DPH
is Hortonworks certified and optimized for Hadoop workload along with additional features
such as power efficiency and cooling with intelligent fan control that supports operation even in
temperatures as high as 45 - 48 degree Celsius.

2.2.2. Reduce Total Cost of Ownership (TCO)


NEC DPH is certified with HDP as a Big Data appliance, built and optimized for Big Data
workloads. It is pre-designed and pre-validated Hadoop platform that integrates hardware and
HDP to reduce deployment period and TCO. DPH enables storage and analysis of both structured
as well as unstructured data such as sensor data, SNS data through batch and real-time processing
in a single platform. It also reduces additional expenditure to derive new business insights and
enables taking appropriate action in real time and improve business performance. It provides pre-
validated and certified upgrade paths to the customer to always use the latest Hadoop version with
updated features.

NEC Reference Architecture for SAP HANA & Hadoop 7


2.2.3. Platform & Data Management Services
NEC offers range of data management services that cover entire life cycle of Big Data &
Analytics. It helps organizations to plan, design and implement optimized infrastructure and
supports them through the process of data ingestion, integration, security, data classification,
tiered storage and delivery across each phase of the data lifecycle.
NEC offers single vendor support for platform design & deployment, upgrade, expansion, product
& operation support, all at one place.

2.3. Vupico's Data Analysis Service


Vupico is an Analytics consulting company with the aim of providing modern solutions in
business intelligence, technology innovation, Big Data, machine learning and predictive analytics.
VUPICO specializes in helping clients through the journey of converting data into business value
through actionable analytics and insights.

VUPICO's innovative services are centered on bringing modern architecture and latest technology
while integrating Big Data IoT, SAP HANA, Hadoop and Predictive Analytics into an information
platform. It provides consulting service that helps customers solve their business problems
through data analytics. Based on their extensive experience in providing data driven solution to
various industries and verticals, Vupico has expertise in implementing an end-to-end dataflow that
ingests data from multiple data-sources and combine the best of Hadoop and SAP solutions.

NEC and Vupico together have designed a proof of concept that integrates NEC DPH and SAP
HANA appliance along with Vupico's analytics use case of credit scoring. Vupico has developed
a business use case and has also implemented data pipeline that shortens the time between
loan request submissions and their subsequent validation, from days to minutes. For better and
effective decision, Vupico has developed additional functionality such as:

•• Identified features that determine creditability of a loan applicant


•• Defined optimal prediction model using machine learning in order to handle
patterns that don’t fit traditional linear regressive models
•• Proposed a credit score approach to have human decision on borderline cases
•• Implemented the flow from data ingestion up to restitution through processing
and storage optimizing, data throughput leveraging in-memory, processing in
Apache Spark and storage in SAP Vora and SAP HANA
•• Comprehensive set of dashboards to support quick decision making

8 NEC Reference Architecture for SAP HANA & Hadoop


Section 3: Benefits of Integrated Platform
SAP HANA and DPH are two disparate solutions that have their own strengths and display
enormous potential when implemented and deployed as a combined solution. SAP HANA in-
memory platform enables businesses to analyze mass data near-real time, while DPH helps to
overcome cost and storage limitations with unprecedented scalability. Hence, integrating DPH
and SAP HANA, amalgamates the advantages of both the solutions and results in a platform that
can process huge amount of structured as well as un-structured data along with running complex
analytic processing at a high speed.

With the increase in the volume of data to be processed and the variety of data consisting of the
conventional structured data and lately unveiled potential data mine i.e. the unstructured data,
a business use case of integrating SAP HANA with Hadoop based platform has created strong
buzz. Libraries such as Spark, process the unstructured data in Hadoop and store it as structured
data in SAP HANA using Hive adapters.

With the use of commodity hardware, DPH helps in reducing the data storage cost. This helps in
reducing the overall solution cost as cold data sets from SAP HANA can be archived on DPH,
thus providing the required scalability at a lower cost.
Some of the key benefits derived by organizations from implementation of this integrated solution
are:

•• Combining the social media data and logs along with CRM data available
in SAP HANA, companies can generate customized promotional offers for
customers on the basis of the analysis performed on a combination of CRM and
clickstream data
•• Preventive maintenance for the equipment placed at remote locations by
combining the sensor data (Unstructured) received from the equipment viz. a
viz. the procurement date and the maintenance schedule data (Structured)
•• Offload data and expensive processes from SAP HANA to the integrated
platform so as to overcome processing bottlenecks and offer increased capacity,
speed and flexibility

NEC Reference Architecture for SAP HANA & Hadoop 9


Section 4: SAP HANA & Hadoop Integrated Solution use case:
Integration of SAP HANA with Hadoop can help customer save huge costs and embrace potential
values. Below table shows the list of use cases that can be offered to customers using integrated
platform:

Data warehouse Business Intelligence


DWH & Data Lake
(DWH) Optimization & Analytics

● DPH as Data staging ● Common storage of ● Data Exploration &


and landing different types/sources Visualization
of data
● Operational data store ● Interactive processing
migration ● ETL and Visualization
● DPH as Active archive ● Batch, real-time and
interactive processing
● Batch processing

4.1. Use Case: Data warehouse (DWH) Optimization


Transforming legacy DWH architecture to support real-time data processing is a massive project.
The modernization initiative may involve multiple aspects like hardware upgrade, tweaking of
data models or addition of new platforms to the environment as extended arm to the existing
DWH. DWH optimized solution features the following key ingredients:

•• A data pipeline consisting of structured, semi-structured and unstructured forms


of data, capable of ingesting and storing voluminous data from a variety of
disparate sources
•• Leverage horizontal scalability/elasticity with Open Source technologies to
reduce costs
•• Augment enterprise data warehouse storage with Hadoop and Hive
•• Use flexible data organization to enable schema-on-read
•• Support for advanced analytics scenarios without the requirement to copy or
migrate data to multiple systems

10 NEC Reference Architecture for SAP HANA & Hadoop


◊◊High-level architecture of DPH and Data warehouse (SAP HANA) integrated solution

Extension of SAP HANA with DPH presents an opportunity for end-users and data scientists to
consume the required information whether in SAP HANA or in DPH system transparently from
the same user interface, without compromising on performance.
While the combined solution offers a plethora of features, many of its uses are simple and have
compelling results. DWH optimization is one of the many benefits presented by the combined
solution that has an easily quantifiable and immediate return on investment.

4.2. Use Case: Business Intelligence and Analytics


With unique advantages, both enterprise operational data as well as unstructured data are critical
to derive business decisions. While Spark along with Hadoop offers advantage of real time
processing with cost-effective storage and management of large volumes of unstructured data,
the ability to combine data at one place and have access to both unstructured data and data from
operational and business systems placed in data warehouses has always been a challenge.
SAP HANA when combined with Hadoop brings both the formats of data together. While data
tiering allows data to be stored in SAP HANA and Hadoop, combination of both presents the
ability to interactively analyze data with a single logical view that ties business and operational
data in SAP HANA with big data in Hadoop. Data scientists have access to both datasets without
requirement to move data between two. Data scientists with this approach now have the ability
to build structured data hierarchies in the unstructured data in Hadoop and integrate it with data
from HANA. They can then use SAP Vora over Spark SQL interface OLAP-style in-memory
analysis on the combined data for better visualization.

NEC Reference Architecture for SAP HANA & Hadoop 11


Section 5: Platform Integration for Analytics
Organizations are increasingly looking at avenues to combine SAP HANA as strategic platform
and integrate it with Big Data Platform like Hadoop with SAP HANA to enable newer analytics
capabilities and lower the total cost of ownership. NEC and Vupico help customer to design and
integrate an end-to-end solution using NEC DPH (Hadoop big data appliance), NEC SAP HANA
appliance and SAP Vora. This integrated solution provides flexibility to store and process data
based on the value of data (hot, warm and cold) and enables to run analytics from a common
platform.

• Advantage of NEC DPH integration with SAP HANA


NEC DPH helps customers to lower the total cost of ownership by enabling the data and workload
consolidation and allows to scale at minimal costs when the demand grows for processing or
storage.

◊◊High level design for integrated platform

12 NEC Reference Architecture for SAP HANA & Hadoop


Some of the key advantages are:

• Unprecedented Scalability:
NEC DPH allows customer to start small and scale as the demand grows for the analytics
platform, by adding one node at a time.

• Common Data Lake Platform:


Customers can consolidate multiple analytics platform to a common data lake platform from
NEC, which allows customers to have single data access platform. It helps organizations to
consolidate the workload running from multiple cluster to a single platform and eliminate the
need of keeping the duplicate copy of data which directly results in huge cost saving.

• Lower TCO:
Consolidating data from multiple clusters and costly data warehouse systems onto a cost effective
data platform enables organizations to distribute the workload effectively and reduce total cost of
ownership.

5.1. Proof of Concept: Intelligent Analytics across SAP HANA and


Hadoop using SAP Vora and Spark by NEC and Vupico
NEC and its solution partner Vupico have jointly designed and developed an end to end solution
to implement data pipeline that would shorten time taken between loan request submission to
validation from days to minutes leveraging in-memory processing and in-memory data storage
combining Apache Spark on NEC Hadoop platform and SAP HANA appliance using SAP Vora.

5.1.1. Business Use Case: Reduce lost opportunities by faster and accurate evaluation of credit score
Business opportunities were lost due to a paper based credit examination process that could
take up to several days for a financial organizations offering loan services. NEC in collaboration
with Vupico streamlined this process by integrating NEC SAP HANA appliance with Hadoop
Platform and SAP Vora & implemented a flow based on machine learning model that performs
risk assessment of an applicant's capability to repay giving a 98% accurate score within
minutes. Based on multiple dimensions like - the applicant’s past and present financial situation,
employment status, assets owned or the amount requested or the purpose, the model calculates a
credit score on a scale between 225 and 900, with lower score meaning high risk borrower and a
high score being a low risk borrower.

NEC Reference Architecture for SAP HANA & Hadoop 13


◊◊End-to-End solution designed by NEC & Vupico

The high level solution implemented as part of this POC helps in addressing the challenge of
analytics by ingesting data from multiple sources to Hadoop whereas SAP Vora bridges the gap
between operational and high value data in SAP HANA and all structured/unstructured data in
Hadoop. Using SAP Vora along with Spark has helped us to simplify the data access between
Hadoop and SAP HANA and only recent data resides in SAP HANA for in-memory processing.

◊◊A high level solution overview implemented for POC

5.2. POC Platform Configuration - Hardware & Software


NEC High-Performance Appliance for SAP HANA and NEC Data Platform for Hadoop provide
the building block for optimal deployment of both SAP HANA and SAP Vora for customer’s
business needs. NEC and Vupico have jointly worked on simplifying the infrastructure design for
customer and provide a solution which helps in driving new business models and insights.
This white paper focuses on a predictive analytics use case of customer loan scoring by
integrating SAP Vora with Spark and Hadoop on the following infrastructure:

14 NEC Reference Architecture for SAP HANA & Hadoop


•• NEC High-Performance Appliance for SAP HANA running SAP HANA and DLM
(Data Lifecycle Management)
•• NEC Big Data Appliance "Data Platform for Hadoop" running Hortonworks HDP
for Hadoop, Spark and SAP Vora

◊◊Example of SAP HANA and Hadoop integrated stack

5.2.1. POC System Solution Component


Below is a high level system component and hardware configuration used for SAP HANA and
Hadoop integration along with some of the key use cases.

◊◊Hardware configuration of the platform for implemented POC

NEC Reference Architecture for SAP HANA & Hadoop 15


Below is the detailed server configuration and list of component installed on each node.

System Details Server Configuration Component Description

2 X Hadoop Server: Express5800/R120e-1M Ambari Server,


Master CPU: 2x E5-2650v2 (2x 8C/2.60GHz) AppTimeline, History
Controller RAM: 128GB Server, Metrics
OS Disk: 2x 100GB HDD(RAID1) Collector, Grafana,
Network: 2x 10GbE (2p) Name Node (HA),
Hadoop: Hortonworks Data Platform 2.6 Resource Manager
OS: Red Hat Enterprise Linux 7.2 (HA), Zookeeper,
Journal Node, Metrics
Monitor, ZKFailover
Controller & HDP
Clients

3 X Hadoop Server: Express5800/R120f-2E Zookeeper, Data Node,


Worker CPU: 2x E5-2620v3 (2x 6C/2.40GHz) Journal Node, Metrics
Controller RAM: 256GB Monitor, Node Manager
OS Disk: 2x 1TB HDD(OS RAID1), & HDP Clients
Data Disk: 12x4TB HDD(Data JBOD)
Network: 2x 10GbE (2p)
Hadoop: Hortonworks Data Platform 2.6
OS: Red Hat Enterprise Linux 7.2

SAP HANA Server: Express5800/A2040b SAP HANA 2.0 SPS02


Appliance CPU: 4x E7-4890v2 (60Core, 2.8GHz)
RAM: 1TB
OS/Data Disk: 8x 900GB HDD (OS/Data)
Network: 2x 10GbE (2p)
SAP: HANA2.0 SPS02
OS: Red Hat Enterprise Linux 7.2 (for SAP HANA)

Tableau Server Server: Express5800/R120e-1M Tableau Server, Tableau


CPU: 2x E5-2650v2 (2x 8C/2.60GHz) Desktop, Hive ODBC
RAM: 128GB driver
OS Disk: 2x 100GB HDD(RAID1)
Network: 2x 10GbE (2p)
OS: Red Hat Enterprise Linux 7.2

16 NEC Reference Architecture for SAP HANA & Hadoop


5.3. Analytical Model (Use case Implementation) and Analytical Model Results
5.3.1. Use Case Background
In the current competitive environment, organizations providing loan services have to innovate their
lending services in order to maximize opportunities and reach under-served customers and verticals.
Customers view loan request as a complex, stressful and lengthy process where they need to provide
a lot of information and justification in order to prove their worthiness to be granted a credit.
One of the key benefit that can be offered to customers can be to make the overall process and
experience as seamless as possible for the loan applicant and limit their stress by shortening the
time between application submissions to receipt of response.
Considering the above, NEC and Vupico have brought their extensive experience in mass data
processing, machine learning and business process re-engineering to support a financial institution
wanting to learn how leveraging big data to redesign their scoring system can drastically reduce
the current scoring pipeline from days to hours or minutes.

5.3.2. Use Case Implementation


To implement the credit scoring use case Vupico applied its analytic framework to first determine
21 major features used to identify the worthiness of a loan applicant, which in turn form a
classification problem and can be handled in Machine Learning through several algorithms each
of which have their merits and demerits.
Based on the past experience and non-linear data type, Vupico quickly concluded that using
traditional algorithms like logistics regression will not perform well and shortlisted two other
algorithms to compare performance - Naïve Bayes and Random Forests over other models,
as they were capable of giving out well defined rules at the end of processing which they learnt
iteratively and can be predicted.

NEC Reference Architecture for SAP HANA & Hadoop 17


Vupico created a very clear and transparent view of what exactly were the components and
rationales of the decision-making process adopted by the artificial intelligence algorithm, a
benefit greatly valued as much too often credit scoring systems are black boxes for the customer
who has to believe in the voodoo happening inside to give the relevant insight.

Instead of having a hardline decision saying if an applicant was granted a loan or not, Vupico
decided to label applicants as high, moderate and low risk on the predictions and generated a
FICO score like indicator on a scale of 225-900 to enable the manual assessment of borderline
cases and enabling processing of loan applicants that would have been rejected with traditional
criteria based models.
After analyzing the data volume and potential throughput requirements, Vupico and NEC decided
to present an architecture with an upstream integration that would ingest and process multiple

18 NEC Reference Architecture for SAP HANA & Hadoop


loan applications at a time with current systems with a pipeline using Apache Kafka to decouple
processing from data producers and consumers, as well as buffering messages to easily implement
a stream oriented processing type of architecture when required. Apache NiFi was used to play
the critical role of managing dataflow influx and orchestrating Apache Spark jobs execution.

◊◊High level workflow for platform

Downstream, data visualization and dash boarding were implemented on Tableau for self-
exploration and analysis of the data by the customer. This was possible because of the in-memory
capabilities of both SAP HANA and SAP Vora that were fed with the scored loan application
giving the customer a complete control over its operation.
To offload data from SAP HANA and save storage cost on the SAP HANA system, a process was
put in place to retain only the latest 3 years of data onto SAP HANA and the rest of the historical
data was transferred to SAP Vora residing on the Hadoop cluster.
Dashboards were built in Tableau and calculation view was used in SAP HANA that enabled
combining the data locally stored in SAP HANA and the data in SAP Vora, letting users query
not only the last 3 years but the whole dataset within acceptable processing time.

5.3.3. SAP Vora Interfaces and Control


SAP Vora can be controlled in two ways, either using the web interfaces provided or
programmatically in Spark directly through the SapSQLContext. It is worth noting that
SapSQLContext is also valid for SAP HANA and enables the user for instance to load data from
Hadoop into SAP HANA by Spark.
There are two main web interfaces for SAP Vora called the SAP Vora Manager, used to manage
SAP Vora services, start and stop different services as well as delete all the data currently in
memory. User can also use it to configure the services and assign which node is responsible for
which services.

NEC Reference Architecture for SAP HANA & Hadoop 19


The second interface is called SAP Vora Tools which allows users to manually execute most of
the operation found in SQL modeler, like creating or dropping tables and views, execute SQL
queries and manually load data into SAP Vora from Hadoop Distributed File System (HDFS).

Tables in SAP Vora need to load their data from HDFS but in case data loaded is from ORC file,
and ORC file has been changed/updated then update will not be reflected in SAP Vora. In such

20 NEC Reference Architecture for SAP HANA & Hadoop


cases, it requires either to manually load the file from SAP Vora tool interface or use a Spark
progress to append the data into SAP Vora table.

Section 6: Product Information

6.1. Hortonworoks Data Platform


The Hortonworks Data Platform(HDP), powered by Apache Hadoop, is a massively scalable and
100% open source platform for storing, processing and analyzing large volumes of data. It is designed
to deal with data from many sources and formats in a very quick, easy and cost-effective manner.
The Hortonworks Data Platform consists of the essential set of Apache Hadoop projects including
MapReduce, HDFS, HCatalog, Pig, Hive, HBase, Zooker and Ambari.

While HDFS provides the scalable, fault-tolerant, cost-efficient storage for your big data lake,
YARN provides the centralized architecture that enables you to process multiple workloads
simultaneously. YARN provides the resource management and pluggable architecture for enabling
a wide variety of data access methods.

Hortonworks contributes to the Apache Hadoop project by committing code or proposing


solutions to issues and strives to deliver the most advanced Hadoop at the right timing.

◊◊Hortonworks Data Platform overview (Source: Hortonworks Inc.)

NEC Reference Architecture for SAP HANA & Hadoop 21


6.2. SAP HANA
SAP HANA is an in-memory, column-oriented, relational database management system which
provides a single platform with application building blocks for database, processing, integration
and application services. SAP HANA offers significant performance benefits over conventional
database platforms for both Online Analytical Processing (OLAP) and Online Transaction
Processing (OLTP) and provides the capabilities as an application server, ETL and can perform
advanced analytics. The systems can scale up or scale out to handle in-memory processing of
terabytes of data.

Additionally SAP HANA has capabilities to support data tiering to manage the data storage
cost and processing at the database storage layer. It helps to extend the platform to intelligently
distribute data and its processing to low cost scalable platform by moving warm and cold data off
the memory to alternate disk based solution like Hadoop.

◊◊SAP HANA Overview (Source: SAP)

6.3. SAP Vora


SAP Vora provides contextually aware analytics capability by integrating the SAP HANA
platform seamlessly with Hadoop. SAP Vora is an in-memory query engine that brings powerful

22 NEC Reference Architecture for SAP HANA & Hadoop


contextual analytics across all data stored in Hadoop, enterprise systems and other distributed
data sources, and drives lower TCO by achieving low cost and faster analytics on huge data set.
SAP Vora extends the capabilities of Spark with a richer SQL capabilities.

SAP Vora is an extended Spark execution framework which provides SQL like capabilities and
produce the accelerated results by processing and loading Hadoop data/tables in memory. SAP
Vora provides a simple graphical interface to model data and build star schemas which helps in
boosting the SQL performance. Additionally, it can help in building the hierarchies and drill down
on Hadoop data which is very difficult to realize in general.

◊◊SAP Vora overview (Source: SAP)

SAP Vora bridges the gap between SAP HANA and Hadoop and enables customer to run several
key business use cases on integrated platform to lower the cost.

6.4. Tableau
Tableau is an interactive data visualization tools that enables users to create interactive and apt
visualizations in the form of dashboards, worksheets to gain business insights for the better
development. It allows users to easily create customized dashboards that provide insight to a
broad spectrum of information.
The characteristics of Tableau are as follows:

•• Using patented technology VizQL™ for visualization, it is much easier to


understand data
•• With its intuitive interface and exceptional ease of use, it is faster and simpler to
get new insights
•• With Tableau's server function, users can publish and share their visualizations
so that anyone can use them

NEC Reference Architecture for SAP HANA & Hadoop 23


Contact Us.

NEC Corporation
www.nec.com

VUPICO LLC
JAPAN
DEUX TOURS  EAST 45F E4502-3-13-1 Harumi, Chuo-ku, Tokyo 104-0053

SINGAPORE
31, St Thomas Walk 0403, St Thomas Suites Singapore 238141

AUSTRALIA
607/17 Grattan Close Glebe NSW, 2037

INDIA
305 Adiya Trade Centre Ameerpet Hyderabad 500081

http://www.vupico.com/ info@vupico.com

NEC IS A HORTONWORKS CERTIFIED TECHNOLOGY PARTNER


Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other countries.
Apache, Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper,
Oozie, Phoenix, NiFi, Zeppelin, Slider, MapReduce, HDFS, YARN, Hadoop elephant, and Apache project logos are either registered trademarks or
trademarks of the Apache Software Foundation in the United States and the other countries.
SAP, SAP logo, SAP HANA, SAP Vora, and other SAP products are the trademark or registered trademark of SAP AG in Germany and in several
other countries.
Tableau and all the Tableau products mentioned in this document are trademark or registered trademark of Tableau Software Inc.
Red Hat and Red Hat Enterprise Linux are trademarks of Red Hat, Inc., registered in the U.S. and other countries.
Intel, Intel logo, Intel Inside, Intel Inside logo, and the other intel products are trademarks or registered trademark of Intel Corporation in the United
States. and other countries.
All other product and service names mentioned are the trademarks of their respective companies.