You are on page 1of 7

Information Management

Module 7 Big Data and NoSQL

EXPLAIN Big data, NoSQL, and MongoDB

What is big data?


Big data is a term that describes the large volume of data. But it’s not the amount of
data that’s important. It’s what organizations do with the data that matters. Big data can be
analyzed for insights that lead to better decisions and strategic business moves.

The term “big data” is so large, fast or complex that it’s difficult or impossible to process using
traditional methods.

History of Big Data

The act of accessing and storing large amounts of information for analytics has been around a
long time. But the concept of big data gained momentum in the early 2000s when industry
analyst Doug Laney articulated the now-mainstream definition of big data as the three V’s:

• Volume: Organizations collect data from a variety of sources, including business


transactions, smart (IoT) devices, industrial equipment, videos, social media and more.

• Velocity: RFID (Radio Frequency Identification) tags, sensors and smart meters are
driving the need to deal with these torrents of data in near-real time.

• Variety: Data comes in all types of formats – from structured, numeric data in
traditional databases to unstructured text documents, emails, videos, audios, stock
ticker data and financial transactions.

Prepared by: Merlie C. Ofiaza (Department of Information Technological Studies) 1


Information Management

Why Is Big Data Important?


The importance of big data doesn’t revolve around how much data you have, but what
you do with it. You can take data from any source and analyze it to find answers that enable:

• cost reductions

• time reductions

• new product development and optimized offerings

• smart decision making.

When you combine big data with high-powered analytics, you can accomplish business-
related tasks such as:

• Determining root causes of failures, issues and defects in near-real time.

• Generating coupons at the point of sale based on the customer’s buying habits.

• Recalculating entire risk portfolios in minutes.

• Detecting fraudulent behavior before it affects your organization.

Big Data Technologies


Big Data Technologies can be defined as software tools for analyzing, processing, and
extracting data from an extremely complex and large data set with which traditional
management tools can never deal.

Top 5 Big Data technologies being used in IT Industries


[https://www.jigsawacademy.com/big-data-5-new-technologies-emerge-2017/]

1. Hadoop Ecosystem

Hadoop Framework was developed to store and process data with a simple programming
model in a distributed data processing environment. The data present on different high-speed
and low-expense machines can be stored and analyzed. Enterprises have widely adopted
Hadoop as Big Data Technologies for their data warehouse needs in the past year. The trend

Prepared by: Merlie C. Ofiaza (Department of Information Technological Studies) 2


Information Management

seems to continue and grow in the coming year as well. Companies that have not explored
Hadoop so far will most likely see its advantages and applications.

2. Artificial Intelligence

Artificial Intelligence is a broad bandwidth of computer technology that deals with the
development of intelligent machines capable of carrying out different tasks typically requiring
human intelligence. AI is developing fast from Apple’s Siri to self-driving cars. As an
interdisciplinary branch of science, it takes into account a number of approaches such as
increased Machine Learning and Deep Learning to make a remarkable shift in most tech
industries. AI is revolutionizing the existing Big Data Technologies.

3. NoSQL Database

NoSQL includes a wide variety of different Big Data Technologies in the database, which are
developed to design modern applications. It shows a non-SQL or non-relational database
providing a method for data acquisition and recovery. They are used in Web and Big Data
Analytics in real-time. It stores unstructured data and offers faster performance and flexibility
while addressing various data types—for example, MongoDB, Redis and Cassandra. It provides
design integrity, easier horizontal scaling and control over opportunities in a range of devices. It
uses data structures that are different from those concerning databases by default, which
speeds up NoSQL calculations. Facebook, Google, Twitter, and similar companies store user
data terabytes daily.

4. R Programming

R is one of the open-source Big Data Technologies and programming languages. The free
software is widely used for statistical computing, visualization, unified development
environments such as Eclipse and Visual Studio assistance communication. According to
experts, it has been the world’s leading language. The system is also widely used by data miners
and statisticians to develop statistical software and mainly data analysis.

Prepared by: Merlie C. Ofiaza (Department of Information Technological Studies) 3


Information Management

5. Data Lakes

Data Lakes means a consolidated repository for storage of all data formats at all levels in terms
of structural and unstructured data.

Data can be saved during Data accumulation as is without being transformed into structured
data. It enables performing numerous types of Data analysis from dashboards and Data
visualization to Big Data transformation in real-time for better business interference.

Businesses that use Data Lakes stay ahead in the game from their competitors and carry out
new analytics, such as Machine Learning, through new log file sources, data from social media
and click-streaming.

NoSQL
NoSQL databases store data in documents rather than relational tables. Accordingly, we classify
them as "not only SQL" and subdivide them by a variety of flexible data models. Types of NoSQL
databases include pure document databases, key-value stores, wide-column databases, and
graph databases. NoSQL databases are built from the ground up to store and process vast
amounts of data at scale and support a growing number of modern businesses.

NoSQL stands for “not only SQL” rather than “no SQL” at all. The following defines the four
most-popular types of NoSQL database:

• Document databases are primarily built for storing information as documents, including, but
not limited to, JSON documents. These systems can also be used for storing XML documents,
for example.
• Key-value stores group associated data in collections with records that are identified with
unique keys for easy retrieval. Key-value stores have just enough structure to mirror the
value of relational databases while still preserving the benefits of NoSQL.
• Wide-column databases use the tabular format of relational databases yet allow a wide
variance in how data is named and formatted in each row, even in the same table. Like key-
value stores, wide-column databases have some basic structure while also preserving a lot of
flexibility.
• Graph databases use graph structures to define the relationships between stored data
points. Graph databases are useful for identifying patterns in unstructured and semi-
structured information.

Prepared by: Merlie C. Ofiaza (Department of Information Technological Studies) 4


Information Management

Why should you use a NoSQL database?

NoSQL databases are a great fit for many modern applications such as mobile, web, and gaming
that require flexible, scalable, high-performance, and highly functional databases to provide great
user experiences.

• Flexibility: NoSQL databases generally provide flexible schemas that enable faster and more
iterative development. The flexible data model makes NoSQL databases ideal for semi-
structured and unstructured data.
• Scalability: NoSQL databases are generally designed to scale out by using distributed clusters of
hardware instead of scaling up by adding expensive and robust servers. Some cloud providers
handle these operations behind-the-scenes as a fully managed service.
• High-performance: NoSQL database are optimized for specific data models and access patterns
that enable higher performance than trying to accomplish similar functionality with relational
databases.
• Highly functional: NoSQL databases provide highly functional APIs and data types that are
purpose built for each of their respective data models.

What is MongoDB?

As a definition, MongoDB is an open-source database that uses a document-oriented data model


and a non-structured query language. It is one of the most powerful NoSQL systems and databases
around, today.

MongoDB Atlas is a cloud database solution for contemporary applications that is available
globally. Leveraging best-in-class automation and established practices, deploy fully managed
MongoDB across AWS, Google Cloud, and Azure ensures availability, scalability, and compliance
with the most stringent data security and privacy requirements. MongoDB Cloud is a unified data
platform that includes a global cloud database, search, data lake, mobile, and application services.

Being a NoSQL tool means that it does not use the usual rows and columns that you so much
associate with relational database management. It is an architecture that is built on collections and
documents. The basic unit of data in this database consists of a set of key-value pairs. It allows
documents to have different fields and structures.
The data model that MongoDB follows is a highly elastic one that lets you combine and store data
of multivariate types without having to compromise on the powerful indexing options, data access,
and validation rules. There is no downtime when you want to dynamically modify the schemas.
What it means that you can concentrate more on making your data work harder rather than
spending more time on preparing the data for the database.
History

Prepared by: Merlie C. Ofiaza (Department of Information Technological Studies) 5


Information Management

Dwight Merriman, Eliot Horowitz, and Kevin Ryan created MongoDB in 2007. To provide a solution
for the problems of scalability and agility they were seeing at DoubleClick. They decided to develop
a database. That’s when MongoDB came into existence.

• MongoDB was first developed by 10gen Software in 2007 as part of a proposed platform as a
service solution.
• The firm switched to an open-source development approach in 2009, with commercial support
and additional services available.
• MongoDB Inc. replaced 10gen as the company’s name in 2013.
• MongoDB became a publicly traded business on October 20, 2017
• MongoDB announced a partnership with Alibaba Cloud on October 30, 2019, to provide a
MongoDB-as-a-Service solution to its clients.

Database: In simple words, it can be called the physical container for data. Each of the databases
has its own set of files on the file system with multiple databases existing on a single MongoDB
server.

Collection: A group of database documents can be called a collection. The RDBMS equivalent to a
collection is a table. The entire collection exists within a single database. There are no schemas
when it comes to collections. Inside the collection, various documents can have varied fields, but
mostly the documents within a collection are meant for the same purpose or for serving the same
end goal.

Document: A set of key-value pairs can be designated as a document. Documents are associated
with dynamic schemas. The benefit of having dynamic schemas is that a document in a single
collection does not have to possess the same structure or fields. Also, the common fields in a
collection document can have varied types of data.

MongoDB vs MySQL
The following table explains the differences between MongoDB and MySQL

MongoDB MySQL
• The query language is javascript • The query language is a structured query
language
• It represents data as JSON documents • It represents data in tables and rows.
• Defining the schema is not required • Defining tables and columns is required
• It does not support JOIN • It supports JOIN
• MongoDB was built with high availability and • Although the MySQL idea does not allow for
scalability in mind and offers replication and effective replication and sharing, it does let
sharding out of the box. users retrieve related data via joins, which
reduces duplication.

Prepared by: Merlie C. Ofiaza (Department of Information Technological Studies) 6


Information Management

For more reading:

1. Big Data. https://www.investopedia.com/terms/b/big-data.asp


2. Big Data. https://www.oracle.com/ph/big-data/what-is-big-data/

x ENGAGE
Activity: (Check on LMS)

Prepared by: Merlie C. Ofiaza (Department of Information Technological Studies) 7

You might also like