You are on page 1of 5

International Journal of Innovative and Emerging Research in Engineering

Volume 3, Issue 2, 2016


Available online at www.ijiere.com
International Journal of Innovative and Emerging
Research in Engineering
e-ISSN: 2394 - 3343 p-ISSN: 2394 - 5494

A Survey of Migration from Traditional Relational Databases


towards To New Big Data Technology
.
Miss Dipika Thakare
Department of Computer Science & Engineering, North Maharashtra University, India
Shri Sant Gadge Baba collage of Engineering & Technology, Bhusawal

ABSTRACT
Inside this paper principally middle on the data is manage in the conventional relational data bases system and new
big data technology .In a word huge amount of data are create in a social media sites and this data are stored and
collect at unprecedented rates.There is a enormous challenge not to accumulate and handle the large capacity of
data, but also extract important information from it. There are several approaches to processing storing, collecting,
and analyzing big data. In a traditional relational databases management system difficult to handling big data. In
this paper discuss about why is it necessary transition from relational database management system to big data
technology.
Keywords-Big data, NoSQL, Relational Databases, Decision making using Big Data, Hadoop

I. INTRODUCTION
“90% of the world’s data was generated in the last few years.”[2]
Due to the latest technologies, communication, devices and device technologies means like social networking sites, the
amount of data produced by mankind is increasing rapidly in every year, every month and every day. The amount of data
produced by from the beginning of time till 2003 was 5 billion gigabytes. The same amount was created in every two days in
2011, and in every ten minutes in 2013. This rate is still growing a great deal. [2] All this information produced is useful and
can be meaningful when processed, it is individual mistreated. International Data Corporation (IDC) believe organizations that
are best to make real-time business decisions using Big Data solutions , while those that are unable to hold and make use will
gradually more find themselves at a competitive disadvantage in the face potential failure and market.[1]

II. RELATIONAL DATABASES


The database management system is the set of the interrelated data and set of the program to operate on it. The collection
of interrelated data is called as database. Database contains the information of interrelated data. In short relational data base
means the information or data are manage in row and column form. A data bases contain information in one and more related
table are called as relational database. The row present in table is called as records and column present in table are called as
attributes or fields. There are two approaches to storing data in a warehouse are the following:
 Dimensional- Transaction data are partitioned into "facts" in dimensional approach, the generally transaction data is
numeric and "dimensions", to gives context to the facts which are reference to the information.[1]
 Normalized- Databases normalization is a database design technique by which relational database tables are
structured or designed in such a way as to make them user and even system friendly. The tables are grouped together
by subject areas that reflect data categories, such as data on customers, products and so on. The normalized structure
divides data into entities, which create several tables in a relational database.[1]
a) RDBMS to Big Data Migration Testing
Big data migration is generally involves in a multiple source system and very large volume of data. However
the most of the organization are open source tool should be set up quickly and after multiple customization options.
In a migration testing set of entities are selected for testing. All application data is migrated in this cycle. An easily
solution can reduce the consecutive testing process.onther challenge defining effective scenarios for each entity. The
solid data transformation rules consider in testing to accepting proper sampling method.[3]

128
International Journal of Innovative and Emerging Research in Engineering
Volume 3, Issue 2, 2016

b) Big Data Migration Process


Hadoop as a service is offered by Amazon web services (AWS).the solution of cloud computing that
operational Challenge of running Hadoop and the large scale or medium scale data processing accessible, easy and
fast. The generally services are available in simple storage service and EMR elastic MapReduce.The services are
prerefered in Amazon Redshirt, a fast, fully manage, data warehouse service.[3] There are three step of process of
migration to the Amazon web services Hadoop environment
o Cloud Service- Virtual machines/physical machines are used to connect and extract the tables from source
databases using Sqoop which pushes them to simple storage service.
o Cloud Storage-Simple storage service cloud storage center is used for all the data that is being sent by virtual
machines. It stores data in flat file format.
o Data processing: Amazon EMR processes and distributes vast amounts of data using Hadoop.[3]

III. WHAT IS BIG DATA


Big data is collection of large data sets cannot process using computing technique. Big data is not single technique,
this technique involves in many area of business. [2] Big data technique becomes applicable for more and more
organization. Big data means what is nothing but data sets volume or variety is ability of commonly used software
tools to process capture and manage the data within a tolerable to business. [1]The difficulty can comes to the related
to data visualization, sharing, storage, capture, search and analytics etc means what the data are very large, very fast
and very hard. [1]The very large means petabyte-scale rate collection of data comes from transaction histories. The
very fast means the data transmission speed is very large. [1]
What comes under the big data?
 Search Engine Data:-
Search engines can retrieve lots of data from different databases.
 Black Box data:-Black box data is a component of airplanes, jets and helicopter etc. It capture recording of
microphones and earphones and also capture the voice of the flight crew and performance information of the
aircraft.
 Transport Data:-Transport data include distance, model, capacity and availability of a vehicle.
 Power Grid Data:-The particular node consumed information holds by the power grid data with respect to
a base station.
 Social media data:-social media data such as twitter, Facebook, hike, whatsApp and any social media sites
hold the information and posted by many people.
 Stock Exchange data:-stock exchange data means hold the information and share of different companies
made by the customer and holds information about ‘buy’ and ‘sell’ decision made.[3]

The Big Data includes categories of huge volume, high velocity, and extensible variety of data. There are three
types of data.
 Semi Structured data: XML data
 Structured data: Relational data.
 Unstructured data: Word, PDF, Text, Media Logs.[3]
The Big Data is characterized by following:[1]
 Variety - The increasingly different types of data that no longer fits into neat, easy to consume structures.
 Velocity - The frequency of new data is generated, captured, and shared.
 Veracity - the disarrayed data
 Volume - The various amounts of data are generated every second this data are larger than what the
conventional relational database infrastructures can deal with.
The capability of Big Data [1]
 The business is done transform way
 To solve today data problem
 Competitive advantage is build in the marketplace

IV. BIG DATA OPERATIONAL AND ANALYTICAL TECHNOLOGIES


To handling the big data there are many technologies can be applied. I) Schema-less databases ii) NoSQL (not
only SQL) there are many approaches used by NoSQL (Not Only SQL) technology for storing and managing
unstructured data. NoSQL (not only SQL) databases separate data storage and data management, the relational
databases are combining both of them. In NoSQL database one of the key concepts focuses on the high-performance
scalable data storage and provides low-level access to a data management layer. That allows data management tasks

129
International Journal of Innovative and Emerging Research in Engineering
Volume 3, Issue 2, 2016

to be written in the application layer. NoSQL databases system can also be called schema-free databases. The most
important is the key advantage of schema-free design is that it enables applications to quickly improve the structure
of data without table rewrites. The data validity and integrity aspect is compulsory at the data management layer. [1]
NoSQL also has considered the function of atomicity, consistency, isolation, and durability (ACID) aspects. [1]
The generally does not maintain complete consistency across distributed servers because the load places on databases,
mostly in distributed systems. Traditional relational databases implement strict transactional semantics to protect
consistency but many NoSQL (Not Only SQL) databases have more scalable architectures that slow down the
consistency requirement. [1]
Big data operational includes systems like MongoDB this systems provide operational capability for a real-
time, interactive workloads where data is primarily captured and stored in a system.[2] NoSQL Big Data systems are
designed to take the advantage of new cloud computing architectures that to allow very big computations to be run
economically and powerfully.[2] This prepared big data workloads much easier to manage, cheaper, and faster to
implement in this system.[2] There are some NoSQL systems can provide coming into patterns and trends based on
real-time data with minimal coding and without the need for data scientists. [2]
Analytical technology includes systems like MapReduce and Massively Parallel Processing (MPP) database
systems. [2]These systems provide analytical capabilities for demonstration and complex analysis that may touch most
of the data. MapReduceSystem provides a new method for analyzing data that is corresponding to the capabilities
provided by SQL (Structure query language).A system based on MapReduce system that can be scaled up from single
servers to more of high and low end machines.
Data Intensive Computing is a division of parallel computing application. This application uses a data parallel
approach for to process big data. This approach works based on the principle of collection of algorithm, programs or
data used to perform working out. Distributed and Parallel system approach of inter-connected individual computers
that work together as a single included computing resource is used to process or analyze big data. [1]
Distributed file system or network file system allows client nodes to access files through a computer network. In this
way a number of users working on multiple machines to storage resources and share files. The client nodes cannot be
access the block storage but can interrelate from end to end a network protocol. This enables a limited access to the
file system depending on the lists on both server’s capabilities or access and clients which are again totally dependent
on the protocol. [1]
Apache Hadoop is analytics and stream computing, this is key technology used for to handle big data. Apache Hadoop
is the open source software project that enables the distributed processing of large data sets across clusters of
commodity servers. It can be scaled up from a single server to thousands of machines and very high degree of
responsibility lenience. [1]

V. BIG DATA FRAMEWORK


This paper express a simple framework under to look at the key components of a big data system in order to work
through much architectural decision as you travel around the world of big data.[1] Big data frequently brings four new
and very different considerations in venture architecture:
 Multiple analytics paradigms and computing methods must be supported:

o Storage models are changing – solutions provide like HDFS (Hadoop Distributed File System)
and unstructured data stores.[1]
o Data sources have a different scale – many companies work in the multi-terabyte and some in
petabyte field.[1]
o Speed is critical – ETL (extract-transform-load) batches are real-time streaming from solutions
like Storm are required insufficient.[1]
 Batch processing: Hadoop as a distributed processing engine and that can examine very large amounts of
data for apply algorithms that range from the simple to the composite.[1]
 Interactive analytics: It can be includes distributed MPP system data warehouses with embedded analytics,
which facilitate business users to do interactive query and apparition of big data.[1]
 Real-time database and analytics: These are generally in memory; enable distributed processing and event-
generation capabilities, cross-data center access to data and scale-out engines that provide low-latency.[1]

VI. WHY TRANSITION FROM RELATIONAL DATABASES TO BIG DATA


The following table shows the difference between the traditional relational databases system and Big Data
database systems (Hadoop). Outstanding to the huge amounts of data being generated and analyzed real time to
provide intelligence to the decision support systems, there are clear need of the time to transition to Big Data. [1]

130
International Journal of Innovative and Emerging Research in Engineering
Volume 3, Issue 2, 2016

Table 1: Difference between RDBMS and Hadoop

RDBMS Hadoop

Data layout Row and column oriented Hadoop is column family oriented

Description In a traditional relational data base In a distributed file system that stores huge
management system data, information or amount of file data on a cloud of machines and
record is store in a tabular form means its handles data redundancy. The top of that
row and column form. distributed file system and Hadoop provides
API for processing all that stored data that is
Map-Reduce. Top of this basic schema and
Column Databases like hBase can be build

Which type of data The traditional relational data base The Hadoop system is support the structured,
support management system is support and work semi- structure, unstructured data
only structured data
Read / write Read /write throughput limits of Read /write throughput limits of Hadoop
throughput limits traditional relational data base Millions of queries per second
management system 1000s
queries/second

Limitations Traditional relational data base Hadoop system works well with streaming
management system limited ability to data
handle streaming data.

Maximum data size The Maximum size of traditional The Maximum size of Hadoop system is
relational data base management system Hundreds of Pitabytes
is terabytes.

SUGGESTED BIG DATA ADOPTION ROADMAP FOR AN ENTERPRISE


Following is a recommended outline of the roadmap for adoption of Big Data for an Enterprise:

Constant focus on Business value and Innovation


 Concentrate on Innovative Technology Solution
 Continuous improvement and future speediness
 Continue to build and treat target state Enterprise capabilities
Architecture and Planning for Enterprise needs
 Create authority framework
 Define Business and Technology blueprints
 Define access points to Big Data
Adopt Big Data through Execution and Integration
 Embed target state Enterprise capabilities in Business
 Integrate Big Data into existing IMF (Information Management framework)
 Center on Business value
 Operationalize Proof of Concepts

V. CONCLUSION
At some stage in the last 35 years, data management principles are physical and logical independence,
declarative querying and cost-based optimization, to a multi-billion dollar in industry. According to IBM 80 percent
of planet‘s data is shapeless and most businesses do not even attempt to use this data to their advantage. Once the
technology to examine big data reach their reach your peak, it will become easier for companies to analyze enormous

131
International Journal of Innovative and Emerging Research in Engineering
Volume 3, Issue 2, 2016

datasets, recognize patterns and then advantageously plan their moves based on consumer requirements that
recognized through remarkable data.
ACKNOWLEDGEMENT
I would like to thank our honorable Principal, Dr. R. P. Singh, our Head of Department, Prof. D. D. Patil,
my special thanks to my guide, Miss. Lavina Panjwani & sincere thanks to all the respected teaching faculties of
Department of Computer Science & Engineering of Hindi Seva Mandal’s, Shri Sant Gadge Baba College of
Engineering & Technology, Bhusawal. My special thanks to all the writers of reference paper that are referred by us.

REFERENCES
[1] Sangeeta Bansal, Dr. Ajay Rana, “Transitioning from Relational Databases to Big Data” International
Journal of Advanced Research in Computer Science and Software Engineering Volume 4, Issue 1, January
2014.
[2] www.tutorialPoint.com Hadoop tutorial.
[3] From Relational Database Management to Big Data: Solutions for Data Migration Testing.
[4] A Nevins Partners, “Why is BIG Data Important?” ,White Paper, May 2012.
[5] www.hadoop.apache.org

132

You might also like