The document discusses the Shark Tank Project which implements Big Data using seven tools: Hadoop, MongoDB, Elasticsearch, Apache Spark, Apache Storm, R language, and Python. Big Data refers to collecting, analyzing, and managing massive amounts of data generated online to identify patterns and behaviors. It manages both data from internet browsing and from applications. Big Data is characterized by its volume, velocity, variety, veracity, and value.
The document discusses the Shark Tank Project which implements Big Data using seven tools: Hadoop, MongoDB, Elasticsearch, Apache Spark, Apache Storm, R language, and Python. Big Data refers to collecting, analyzing, and managing massive amounts of data generated online to identify patterns and behaviors. It manages both data from internet browsing and from applications. Big Data is characterized by its volume, velocity, variety, veracity, and value.
The document discusses the Shark Tank Project which implements Big Data using seven tools: Hadoop, MongoDB, Elasticsearch, Apache Spark, Apache Storm, R language, and Python. Big Data refers to collecting, analyzing, and managing massive amounts of data generated online to identify patterns and behaviors. It manages both data from internet browsing and from applications. Big Data is characterized by its volume, velocity, variety, veracity, and value.
5V BIG DATA IMPLEMENTATION Maira Liceth GarcIa Pérez
Natalia Lorena Gracia Castro
Big Data is the set of technologies that have been created to collect, analyze and manage the data generated by Internet users. Its main idea is to collect the massive data that is generated "raw", and process it to identify patterns or other types of behavior that can help specific sectors. Big Data does not only refer to the data that is generated when browsing the Internet. It is also the raw data generated by the users of different applications or services. This program manages seven tools that are: • The Hadoop: program of large volumes of data. 1. Optimize business operations: • MongoDB: Database or documents. redefines the art of the possible, • Elasticsearch: complete search tool where what was previously unsolvable • Apache Spark: open source processing. is now more decipherable. • Apache Storm: real-time system. 2. Delivers Fast Responses – Your ability • R language: software environment for statistical calculation and to deliver the best business decisions graphics in a fraction of the time. • Python: user use window (statistical, biological, physical and 3. Improve the quality of services: better others) analysis, better information and faster For the correct management of this large amount of data, it is data processing necessary to know the five dimensions that make up Big Data, 4. Relevant and personalized marketing known as the five Vs. The first is the Volume which deals with the strategies: analysis in a mass storage of the amount of data collected, web pages, social complementary way to meet your networks, loT, etc.; Followed by the Speed, which is the data that is demands. generated in real time and processed with the same speed; the 5. Deliver new services that were Variety that handles all types of data, structured or unstructured; previously impossible: Helps Veracity, which is the quality and reliability of the data and, finally, organizations capitalize on a broader Value, which consists in the fact that the data must be capable of range of new data sources. providing value or benefit to the company or person that uses it.