Professional Documents
Culture Documents
ABSTRACT
Inside this paper principally middle on the data is manage in the conventional relational data bases system and new
big data technology .In a word huge amount of data are create in a social media sites and this data are stored and
collect at unprecedented rates.There is a enormous challenge not to accumulate and handle the large capacity of
data, but also extract important information from it. There are several approaches to processing storing, collecting,
and analyzing big data. In a traditional relational databases management system difficult to handling big data. In
this paper discuss about why is it necessary transition from relational database management system to big data
technology.
Keywords-Big data, NoSQL, Relational Databases, Decision making using Big Data, Hadoop
I. INTRODUCTION
“90% of the world’s data was generated in the last few years.”[2]
Due to the latest technologies, communication, devices and device technologies means like social networking sites, the
amount of data produced by mankind is increasing rapidly in every year, every month and every day. The amount of data
produced by from the beginning of time till 2003 was 5 billion gigabytes. The same amount was created in every two days in
2011, and in every ten minutes in 2013. This rate is still growing a great deal. [2] All this information produced is useful and
can be meaningful when processed, it is individual mistreated. International Data Corporation (IDC) believe organizations that
are best to make real-time business decisions using Big Data solutions , while those that are unable to hold and make use will
gradually more find themselves at a competitive disadvantage in the face potential failure and market.[1]
128
International Journal of Innovative and Emerging Research in Engineering
Volume 3, Issue 2, 2016
The Big Data includes categories of huge volume, high velocity, and extensible variety of data. There are three
types of data.
Semi Structured data: XML data
Structured data: Relational data.
Unstructured data: Word, PDF, Text, Media Logs.[3]
The Big Data is characterized by following:[1]
Variety - The increasingly different types of data that no longer fits into neat, easy to consume structures.
Velocity - The frequency of new data is generated, captured, and shared.
Veracity - the disarrayed data
Volume - The various amounts of data are generated every second this data are larger than what the
conventional relational database infrastructures can deal with.
The capability of Big Data [1]
The business is done transform way
To solve today data problem
Competitive advantage is build in the marketplace
129
International Journal of Innovative and Emerging Research in Engineering
Volume 3, Issue 2, 2016
to be written in the application layer. NoSQL databases system can also be called schema-free databases. The most
important is the key advantage of schema-free design is that it enables applications to quickly improve the structure
of data without table rewrites. The data validity and integrity aspect is compulsory at the data management layer. [1]
NoSQL also has considered the function of atomicity, consistency, isolation, and durability (ACID) aspects. [1]
The generally does not maintain complete consistency across distributed servers because the load places on databases,
mostly in distributed systems. Traditional relational databases implement strict transactional semantics to protect
consistency but many NoSQL (Not Only SQL) databases have more scalable architectures that slow down the
consistency requirement. [1]
Big data operational includes systems like MongoDB this systems provide operational capability for a real-
time, interactive workloads where data is primarily captured and stored in a system.[2] NoSQL Big Data systems are
designed to take the advantage of new cloud computing architectures that to allow very big computations to be run
economically and powerfully.[2] This prepared big data workloads much easier to manage, cheaper, and faster to
implement in this system.[2] There are some NoSQL systems can provide coming into patterns and trends based on
real-time data with minimal coding and without the need for data scientists. [2]
Analytical technology includes systems like MapReduce and Massively Parallel Processing (MPP) database
systems. [2]These systems provide analytical capabilities for demonstration and complex analysis that may touch most
of the data. MapReduceSystem provides a new method for analyzing data that is corresponding to the capabilities
provided by SQL (Structure query language).A system based on MapReduce system that can be scaled up from single
servers to more of high and low end machines.
Data Intensive Computing is a division of parallel computing application. This application uses a data parallel
approach for to process big data. This approach works based on the principle of collection of algorithm, programs or
data used to perform working out. Distributed and Parallel system approach of inter-connected individual computers
that work together as a single included computing resource is used to process or analyze big data. [1]
Distributed file system or network file system allows client nodes to access files through a computer network. In this
way a number of users working on multiple machines to storage resources and share files. The client nodes cannot be
access the block storage but can interrelate from end to end a network protocol. This enables a limited access to the
file system depending on the lists on both server’s capabilities or access and clients which are again totally dependent
on the protocol. [1]
Apache Hadoop is analytics and stream computing, this is key technology used for to handle big data. Apache Hadoop
is the open source software project that enables the distributed processing of large data sets across clusters of
commodity servers. It can be scaled up from a single server to thousands of machines and very high degree of
responsibility lenience. [1]
o Storage models are changing – solutions provide like HDFS (Hadoop Distributed File System)
and unstructured data stores.[1]
o Data sources have a different scale – many companies work in the multi-terabyte and some in
petabyte field.[1]
o Speed is critical – ETL (extract-transform-load) batches are real-time streaming from solutions
like Storm are required insufficient.[1]
Batch processing: Hadoop as a distributed processing engine and that can examine very large amounts of
data for apply algorithms that range from the simple to the composite.[1]
Interactive analytics: It can be includes distributed MPP system data warehouses with embedded analytics,
which facilitate business users to do interactive query and apparition of big data.[1]
Real-time database and analytics: These are generally in memory; enable distributed processing and event-
generation capabilities, cross-data center access to data and scale-out engines that provide low-latency.[1]
130
International Journal of Innovative and Emerging Research in Engineering
Volume 3, Issue 2, 2016
RDBMS Hadoop
Data layout Row and column oriented Hadoop is column family oriented
Description In a traditional relational data base In a distributed file system that stores huge
management system data, information or amount of file data on a cloud of machines and
record is store in a tabular form means its handles data redundancy. The top of that
row and column form. distributed file system and Hadoop provides
API for processing all that stored data that is
Map-Reduce. Top of this basic schema and
Column Databases like hBase can be build
Which type of data The traditional relational data base The Hadoop system is support the structured,
support management system is support and work semi- structure, unstructured data
only structured data
Read / write Read /write throughput limits of Read /write throughput limits of Hadoop
throughput limits traditional relational data base Millions of queries per second
management system 1000s
queries/second
Limitations Traditional relational data base Hadoop system works well with streaming
management system limited ability to data
handle streaming data.
Maximum data size The Maximum size of traditional The Maximum size of Hadoop system is
relational data base management system Hundreds of Pitabytes
is terabytes.
V. CONCLUSION
At some stage in the last 35 years, data management principles are physical and logical independence,
declarative querying and cost-based optimization, to a multi-billion dollar in industry. According to IBM 80 percent
of planet‘s data is shapeless and most businesses do not even attempt to use this data to their advantage. Once the
technology to examine big data reach their reach your peak, it will become easier for companies to analyze enormous
131
International Journal of Innovative and Emerging Research in Engineering
Volume 3, Issue 2, 2016
datasets, recognize patterns and then advantageously plan their moves based on consumer requirements that
recognized through remarkable data.
ACKNOWLEDGEMENT
I would like to thank our honorable Principal, Dr. R. P. Singh, our Head of Department, Prof. D. D. Patil,
my special thanks to my guide, Miss. Lavina Panjwani & sincere thanks to all the respected teaching faculties of
Department of Computer Science & Engineering of Hindi Seva Mandal’s, Shri Sant Gadge Baba College of
Engineering & Technology, Bhusawal. My special thanks to all the writers of reference paper that are referred by us.
REFERENCES
[1] Sangeeta Bansal, Dr. Ajay Rana, “Transitioning from Relational Databases to Big Data” International
Journal of Advanced Research in Computer Science and Software Engineering Volume 4, Issue 1, January
2014.
[2] www.tutorialPoint.com Hadoop tutorial.
[3] From Relational Database Management to Big Data: Solutions for Data Migration Testing.
[4] A Nevins Partners, “Why is BIG Data Important?” ,White Paper, May 2012.
[5] www.hadoop.apache.org
132