You are on page 1of 3

The Art of Big Data

Rahul Beakta
Computer Science & Engineering Department
Baddi University of Emerging Sciences & Technology,
Baddi, Himachal Pradesh, India

Abstract—In this paper we will discuss various ways of Big
Data to improve the digital world. Big data is now a cost effective
approach to deal with high volume and complex data. In the
massive data useful information and patterns are hidden which
cannot be processed by the conventional database systems. New
strategies for managing the data are discussed to convert big data
into smart data. This paper aims to represent advantages of big
data analytics and discuss big data management strategies used
to manage the data.

Shalini Chauhan
Computer Science & Engineering Department
Baddi University of Emerging Sciences & Technology,
Baddi, Himachal Pradesh, India

Data velocity: Data is generating very quickly from
different locations.

Data variety: Data is stored in structured, semistructured and unstructured form [6].

Data volume: Data are in size of terabytes or petabytes.

Data complexity: Data that is stored and managed in
different locales, data centers, or cloud geo-zones.

Keywords—big data; analytics; characteristics; hadoop; IoT.




Now a days everyone is talking about Big Data. But what
exactly it is? Big Data is similar to small data but in large
volume. There is no exact definition for big data. Data that has
extra-large Volume, comes from Variety of sources, Variety of
formats and comes at us with a great Velocity is normally refer
to as Big Data [1]. Big data can be structured, unstructured or
semi-structured which cannot be processed by the
conventional techniques and tools. Due to the large volume of
data which is growing rapidly fast from different sources it is
very difficult to capture, process and analyze the data [2].

It is a database which provides data storage and retrieval
which is not in the form of rows and columns. NoSQL systems
are also sometimes called “Not Only SQL” that means they
also support SQL. NoSQL databases are increasingly used in
big data applications. Companies like Facebook and Google
are using NoSQL.
Why NoSQL?

Handle both unstructured and semi structured data.

Adapt to change with updates and time.

Support large numbers of concurrent users.

Deliver better experiences to a globally distributed
Always available.

B. MapReduce
MapReduce is a programming model. It is designed for
processing large volumes of data in parallel. It divides the
work into a set of independent tasks [3]. It is used to run
programs in Hadoop [4].
It has two functions:

Fig. 1. Characterization of Big Data: Volume, Velocity, Variety.

Big Data characteristics:

Map: The master node takes the input, divide into
smaller subparts and distribute into worker nodes [5].

Reduce: Master node collects the answers from all the
sub problems and combines them together to form the
output [7].

Big data framework. To manage the various forms and variety of data new and old skills along with data management tools are applied [11]. This heterogeneous data should be handled carefully. A multilayered security system should be implemented. 2). These can be analyzed to take better decisions. Transactional and operational data is used for to know a customer’s purchase activity at a particular ecommerce website. In the first dimension we have mentioned data type. Hadoop : It is an open-source software framework. This data is not processed by the conventional database systems. But there is lack of qualified data scientists/statistician in the market. In so many conferences and news it is shown as an “IoT Revolution”. 4) Decision Science: It involves analysis on non transactional data. This model uses predictive approach to predict user behaviour based upon their previous transactions on database. As a technology concern. IoT is one of the very important topic in technology and engineering[9]. It stores and processes huge data sets on a large cluster of commodity hardware [13]. devices and connectivity [10]. Consumer gives reviews and product suggestions on the online platforms. it can result to faster and direct results for the organizations. 2) Big Data Technologies: Its very clear that huge data will be created by Internet of Things but to tackle with this data better technologies should be applied. It consists of : 1) Hadoop File System. Second dimension is business objective in which organizations measure or experiments while performing big data capabilities. STRATEGIES FOR MANAGING BIG DATA This is where when various strategies and tools are applied to manage the big data. Organizations uses data mining techniques to target the customers on their previous experience on their portal. III. Internet of Things is becoming more popular with the evolution of Big data. 2) Data Exploration: 1) Data Storage: Internet of Things is collecting a lot data to the data centers of organizations. This IoT security is new for the security professionals and lack of experience can increase the risk. Analysts can use queries and filter the output also. Data which is structured and operational is called as transactional data.. Managers in organizations can take short term business decisions and longer plans [12]. Twitter and Instagram. making it an effective complement to a traditional enterprise data infrastructure [8]. This data is also called as big data and hidden data can be extracted from this data. Data can be in many forms for example structured or unstructured. Sentiment analysis can be done from this data which can give some very amazing results to the organizations. . IV. There will be different data for a device. BIG DATA AND INTERNET OF THINGS The Internet of Things(IoT) is network of connected devices through software and sensors. Other techniques are used to manage this data and can have some useful patterns. New experiments are performed on the basis of previous data analysis. It uses statistical methods to get the experimental results which might not came previously.C. In experimenting organizations look for some scientific tools to apply. This data is not in the form of structured. Due to network of internet of things a lot more data are generated which contains a lot useful information hidden in it. To understand the big data strategies and techniques we have designed a framework(fig. Most of social data comes from the social media websites such as facebook. Decision scientists explore social big data to “field research” for example to determine feedback. Hadoop delivers distributed processing power at a remarkably low cost. Hadoop and NoSQL etc. 2. Data that come from unstructured sources (e. Impact of Internet of Things on Big Data: Fig. The rise in statistical techniques. validity. value. feasibility to put the idea in action. 3) Data Security: There will be different devices for the data. 2) MapReduce Programming Paradigm.g. 3) Social Analytics: It includes all the non-transactional data. social media) is known as non transactional. technologies are used. Big data is capable of extracting important values from the large data sets. Big data is all about data and Internet of Things is about data. Big data technology are applied to analyze the data. Finally this results in four quadrants as shown in figure. Big Data Strategies: 1) Performance Management: It uses predetermined queries and multidimensional analysis.

org/wiki/Internet_of_Things. https://en. IBM Big Data analytics HUB. www. 2014).sas. Issue 10. V. Volume 3.html. Ivey Business journal. Big data and IoT are linked to each another as all devices in IoT are connected to internet and constantly generating the data which can be used to discover information and hidden patterns. 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering.S. It contains huge amount of data and Big data comes along with this so new strategies and management techniques should be implemented.4) Big Data Analytics: Basically Big Data and IoT are two sides of a coin. Shalini Jain.” International Journal of Scientific and Research Publications.wikipedia. Satendra Sonare. Devendra P. MapReduce and HDFS for Big Data Applications”. Bhosale.ibmbigdatahub. Ashok Verma.Wadne. The growth of Internet of Things has a revolution with this. Bala Iyer. extracted and better strategies should be applied to get the more benefit from big data. “BIG Data and Methodology-A review”. C-MEANS and MapReduce”. 2013 Storage Networking Industry Association. 2008. Spl. Big Data and Internet of Things. October 2013 . International Journal of Computer Science & Information Technology. http://www. References [1] [2] [3] [4] [5] [6] [7] Fig.C. “Four strategies to capture and create value from big data”. Salvatore Parise.”Big Data Analysis using HDFS. International Journal of Computer Engineering & Technology (IJCET) Volume 6. pp. Issue 6. Issue 10. Prof. Article ID: IJCET_06_12_008 . Dec 2015. Volume 5. “Internet of Things” Wikipedia. We believe data Internet of Things is of great significance with the big data. V. Philip Russom. and can provide a lot of data which should be properly examined.p-ISSN: 2278-8727. “Big data and the Internet of things: Two sides of the same coin”. Gadekar. As we know not all the data is important so a proper infrastructure should be developed for analytics. CONCLUSION In this paper we have examined the big data and its opportunities to use the massive data to extract the hidden information and patterns from the data. FOURTH QUARTER 2013. Prof. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661. Issues: july/august Shilpa. “Scaling Hadoop to 4000 Nodes at Yahoo” Yahoo! Developer Network Blog. K. October 2014 1 ISSN 22503153. Now a days organizations are facing a lot problems to extract the information from the IoT data. Issue 12. Volume 4. “Big Data Mining using Map Reduce: A Survey Paper”. TDWI BEST PRACTICES REPORT. TDWI research. RIEECE 2015. Internet of Things and Big Data are both important because both are based on the technology improvements. encourage ourselves in big data research if we want to get the benefit of this huge data. Shital Suryawanshi. Issue 2 (2015) Harshawardhan S. Ms. Serge Blazhievsky. “A Review Paper on Big Data and Hadoop. “REVIEW PAPER ON BIG DATA USING HADOOP”. VII (Nov – Dec. “Big Data And Hadoop: A Review Paper”. Tamara Dull. V. 3. BUEST. Volume 2. ISSN: 2277 128X. Ms. Shvachko and A. Volume 16. Ver. Murthy. “MANAGING BIG DATA”. PP 37-40. “Introduction to Hadoop. Manpreet Kaur. We must [8] [9] [10] [11] [12] [13] [14] [15] Rahul Beakta. We are living In this information age where a large volume of data are produced daily and within this data lay patterns and hidden knowledge which should be analyzed. extracted and utilized. International Journal of Advanced Research in Computer Science and Software Engineering. Issue 4. Gurpreet Kaur. Manjit Kaur.