You are on page 1of 12

Big Data Analytics

Prepared by: Raed Karim, Ph.D.


Big Data Analytics
• The process of examining large and varied data sets at a high speed- Big
Data to: 
• uncover hidden patterns, unknown correlations, market trends, customer
preferences and other useful information.

From IBM:
• Big data analytics is the use of advanced analytic techniques against very large,
diverse data sets that include different types such as structured/unstructured
and streaming/batch, and different sizes from terabytes to zettabytes.
• Big data is a term applied to data sets whose size or type is beyond the ability
of traditional relational databases to capture, manage, and process the data
with low-latency.
Big Data Analytics
• It has one or more of the following characteristics – high volume, high velocity,
or high variety. Big data comes from sensors, devices, video/audio, networks, log
files, transactional applications, web, and social media - much of it generated in
real time and in a very large scale.
• Analyzing big data allows analysts, researchers, and business users to make
better and faster decisions using data that was previously inaccessible or
unusable.
• Using advanced analytics techniques such as text analytics, machine learning,
predictive analytics, data mining, statistics, and natural language processing,
businesses can analyze previously untapped data sources independent or
together with their existing enterprise data to gain new insights resulting in
significantly better and faster decisions.
Big Data Analytics
• Why Big data analytics.
• business benefits, including new revenue opportunities, more effective marketing, better
customer service, improved operational efficiency.
• analyze growing volumes of structured transaction data and unstructured data that can not be
handled by conventional db systems and BI.
• means of analyzing data sets and drawing conclusions about them to help organizations make
informed business decisions.
• Big data analytics involves complex applications with elements such as predictive and
descriptive models, statistical algorithms and what-if analyses.

Big Data Analytics Technologies


Many organizations start collecting and analyzing big data turned to Hadoop and NoSQL.
• Yarn, MapReduce, Spark, Hbase, Hive, etc.
Big Data Analytics
• Hadoop and NoSQl are set to be staging areas for data before it gets loaded into a data
warehouse used for analytics.
• Hadoop data lake is used that serves as the primary repository for incoming streams of
big data.
• Once the data is ready, it can be analyzed with the software commonly used
in advanced analytics processes. 
• Data mining algorithms/techniques.
• Machine learning
• Statistical analysis
• Recommender systems
• Sentiment and text mining
• Predictive models
• Deep learning
Unstructured data
• Unstructured data files often include text and multimedia content. Examples
include e-mail messages, word processing documents, videos, photos, audio
files, presentations, webpages and many other kinds of business documents.
• While this type of data may have an internal structure, it is considered
"unstructured" because the data there doesn't fit neatly in a database.
• Experts estimate that 80 to 90 percent of the data in any organization is
unstructured.
• The amount of unstructured data in enterprises is growing significantly —
often many times faster than structured databases are growing.
Unstructured data
• Structured data generally resides in a relational database RDBMS, can
be easily mapped into pre-designed fields.
• Unstructured data is not relational and doesn't fit into these sorts of
pre-defined data models.
Unstructured Data Management

• Big data tools


Software like Hadoop can process stores of both unstructured and
structured data that are extremely large, very complex and changing rapidly.
• Business intelligence software
is a broad category of analytics, data mining, dashboards and reporting
tools that help companies make sense of their structured and unstructured
data for the purpose of making better business decisions.
• Document management systems
Also called enterprise content management systems, a DMS can track, store
and share unstructured data that is saved in the form of document files.
Source of unstructured data
• Here are some examples of machine-generated unstructured data:
• Satellite images:weather data or government-satellite, surveillance imagery
• Scientific data
• Photographs and video: This includes security, surveillance, and traffic video.
• Radar or sonar data: This includes vehicular, meteorological, and
oceanographic seismic profiles.
• Text internal to your company: documents, logs, survey results, and e-mails.
• Social media data: social media platforms such as YouTube, Facebook, Twitter,
LinkedIn, and Flickr.
• Mobile data: text messages and location information.
Semi-Structured Data
• In addition to structured and unstructured data, there's also a third
category: semi-structured data. Semi-structured data is information
that doesn't reside in a relational database but that does have some
organizational properties that make it easier to analyze. Examples of
semi-structured data might include XML documents.
• Lies somewhere between the other types of data. It is not organized
in a complex manner that makes sophisticated access and analysis
possible; has associated information, such as metadata, that makes it
more prepared to processing than raw data.
Managing semi-structured data
• In this dynamic IT world, different data forms and how (or if) you
need to be managed.
• Discard this type of data.
• Force it into relational db.
• Adopt different techniques for semi data: analyzing metadata
• https://youtu.be/5dk53PTK3g0
• https://www.youtube.com/watch?v=ypbSMS8XrAE 2:45
• https://www.youtube.com/watch?v=_HbjsNaUJ2A selected scenes

You might also like