Author-Aditi Pawar #*FYCO-1,CO Department Bahusaheb Vartak Polytechnic
Abstract-Big data is a broad term for data sets so large or
complex that traditional data processing applications are CHARACTERISTICS inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive Big data can be described by the following characteristics: analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in big data Volume – The quantity of data that is generated is very important in this may lead to more confident decision making. And better decisions context. It is the size of the data which determines the value and potential can mean greater operational efficiency, cost reductions and reduced of the data under consideration and whether it can actually be considered risk. Big Data or not. The name ‘Big Data’ itself contains a term which is related to size and hence the characteristic. Keywords— Big data,storeage,sharing. Variety - The next aspect of Big Data is its variety. This means that the category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysts. This helps the people, who are I. INTRODUCTION closely analyzing the data and are associated with it, to effectively use the Big Data is a collection of data that is huge in volume, yet growing data to their advantage and thus upholding the importance of the Big exponentially with time. It is a data with so large size and complexity that Data. none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size. Velocity - The term ‘velocity’ in the context refers to the speed of generation of data or how fast the data is generated and processed to meet Big data analysis helps businesses make better decisions while maximizing the demands and the challenges which lie ahead in the path of growth and operations and reducing risk and efficiency. By using big data analytics development. tools, businesses worldwide are improving their digital marketing strategies by leveraging data and reducing risk from social platforms. Variability - This is a factor which can be a problem for those who analyse the data. This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively. ARCHITECTURE: Veracity - The quality of the data being captured can vary greatly. Accuracy of analysis depends on the veracity of the source data. In 2004, Google published a paper on a process called MapReduce that used such an architecture. The MapReduce framework provides a parallel Complexity - Data management can become a very complex process, processing model and associated implementation to process huge amounts especially when large volumes of data come from multiple sources. These of data. With MapReduce, queries are split and distributed across parallel data need to be linked, connected and correlated in order to be able to nodes and processed in parallel (the Map step). The results are then grasp the information that is supposed to be conveyed by these data. This gathered and delivered (the Reduce step). The framework was very situation, is therefore, termed as the ‘complexity’ of Big Data successful, so others wanted to replicate the algorithm. Therefore, an implementation of the MapReduce framework was adopted by an Apache open source project named Hadoop. APPLICATIONS
MANUFACTURING Big data has increased the demand of information management
specialists in that Software AG, Oracle Corporation, IBM, Microsoft, Based on TCS 2013 Global Trend Study, improvements in supply SAP, EMC, HP and Dell have spent more than $15 billion on software planning and product quality provide the greatest benefit of big data for firms specializing in data management and analytics. In 2010, this manufacturing. Big data provides an infrastructure for transparency in industry was worth more than $100 billion and was growing at almost manufacturing industry, which is the ability to unravel uncertainties such 10 percent a year: about twice as fast as the software business as a as inconsistent component performance and availability. Predictive whole. manufacturing as an applicable approach toward near-zero downtime and transparency requires vast amount of data and advanced prediction tools Developed economies make increasing use of data-intensive for a systematic process of data into useful information. A conceptual technologies. There are 4.6 billion mobile-phone subscriptions framework of predictive manufacturing begins with data acquisition worldwide and between 1 billion and 2 billion people accessing the where different type of sensory data is available to acquire such as internet Between 1990 and 2005, more than 1 billion people worldwide acoustics, vibration, pressure, current, voltage and controller data. Vast entered the middle class which means more and more people who gain amount of sensory data in addition to historical data construct the big money will become more literate which in turn leads to information data in manufacturing. The generated big data acts as the input into growth. The world's effective capacity to exchange information through predictive tools and preventive strategies such as Prognostics and Health telecommunication networks was 281 petabytes in 1986, 471 petabytes Management (PHM). in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007and it is predicted that the amount of traffic flowing over the internet will reach 667 exabytes annually by 2014. It is estimated that one third of the globally stored information is in the form of alphanumeric text and still image data, which is the format most useful for most big data applications. This also shows the potential of yet unused data (i.e. in the form of video and audio content). ACKNOWLEDGEMENT
A. References Big Data is a term that refers to a massive amount of structured
and unstructured data that is generated at an unprecedented scale. This data is so vast that traditional methods of data processing and 1. IBM Big Data & Analytics Hub: analysis are no longer sufficient. Big Data has become increasingly https://www.ibmbigdatahub.com/ important for businesses looking to improve their operations, 2. Hortonworks Data Platform: reduce risk, and make informed decisions. It provides valuable https://hortonworks.com/products/data-platforms/hdp/ insights that can help companies identify trends and patterns that 3. Cloudera Data Platform: were previously hidden. https://www.cloudera.com/products/cloudera-data- platform.html The characteristics of Big Data can be summarized as follows: 4. MapR Data Platform: https://mapr.com/products/mapr- Volume, Variety, Velocity, Variability, Veracity, and Complexity. Volume refers to the vast amount of data that is generated and platform/ collected every day. Variety refers to the different types of data 5. Apache Hadoop: https://hadoop.apache.org/ that are available, including structured, unstructured, and semi- 6. Data Science Central: https://www.datasciencecentral.com/ structured data. Velocity refers to the speed at which data is 7. KDnuggets: https://www.kdnuggets.com/ generated and processed. Variability refers to the inconsistencies 8. Big Data Made Simple: https://bigdata-madesimple.com/ in the data that can be challenging to manage. Veracity refers to 9. Datafloq: https://datafloq.com/ the accuracy and trustworthiness of the data. Complexity refers to 10. Big Data University: https://bigdatauniversity.com/ the challenge of managing and processing large volumes of data that come from multiple sources.
In summary, Big Data is a crucial aspect of modern businesses. It
provides valuable insights and opportunities for companies to improve their operations, reduce risk, and make informed I. CONCLUSIONS decisions. Understanding the characteristics of Big Data is essential for data analysts and businesses that want to leverage the power of data. The availability of Big Data, low-cost commodity hardware, and new information management and analytic software have produced a unique moment in the history of data analysis. The convergence of these trends means that we have the capabilities required to analyze astonishing data sets quickly and cost- effectively for the first time in history. These capabilities are neither theoretical nor trivial. They represent a genuine leap forward and a clear opportunity to realize enormous gains in terms of efficiency, productivity, revenue, and profitability. The Age of Big Data is here, and these are truly revolutionary times if both business and technology professionals continue to work together and deliver on the promise
Marc Wintjen - Practical Data Analysis Using Jupyter Notebook - Learn How To Speak The Language of Data by Extracting Useful and Actionable Insights Using Python-Packt Publishing (2020)