You are on page 1of 6


Topic: Unstructured Data Analytics

Submitted To:
Navpreet Kaur

Submitted By:
Vinay Goyal
MBA-2nd Year (2216)

Business analytics (BA) refers to the skills, technologies, practices for

continuous iterative exploration and investigation of past business performance to
gain insight and drive business planning. Business analytics focuses on developing
new insights and understanding of business performance based
on data and statistical methods. In contrast, business intelligence traditionally
focuses on using a consistent set of metrics to both measure past performance and
guide business planning, which is also based on data and statistical methods.

Examples of BA uses include:

Exploring data to find new patterns and relationships ( data mining)

Explaining why a certain result occurred (statistical analysis, quantitative


Experimenting to test previous decisions (A/B testing, multivariate testing)

Forecasting future results (predictive modeling, predictive analytics)

Unstructured Data
Unstructured data is a generic label for describing any data that is not in a
database or other type of data structure.
Unstructured data is a generic label for describing data that is not
contained in a database or some other type of data
structure . Unstructured data can be textual or non-textual. Textual
unstructured data is generated in media like email messages, PowerPoint
presentations, Word documents, collaboration software and instant
messages. Non-textual unstructured data is generated in media
like JPEG images, MP3 audio files and Flash video files.
If left unmanaged, the sheer volume of unstructured data thats generated
each year within an enterprise can be costly in terms of storage.

Unmanaged data can also pose a liability if information cannot be located

in the event of a compliance or lawsuit. The information contained in
unstructured data is not always easy to locate. It requires that data in both
electronic and hard copy documents and other media be scanned so a
search application can parse out concepts based on words used in specific
contexts. This is called semantic search. It is also referred to as enterprise
In customer-facing businesses, the information contained in unstructured
data can be analyzed to improve customer relationship management and
relationship marketing. As social media applications like Twitter and
Facebook go mainstream, the growth of unstructured data is expected to
far outpace the growth of structured data. According to the "IDC
Enterprise Disk Storage Consumption Model" report released in Fall 2009,
while transactional data is projected to grow at a compound annual growth
rate (CAGR) of 21.8%, it's far outpaced by a 61.7% CAGR prediction for
unstructured data.
Unstructured data (or unstructured information) refers to information that
either does not have a pre-defined data model or is not organized in a predefined manner. Unstructured information is typically text-heavy, but may
contain data such as dates, numbers, and facts as well. This results in
irregularities and ambiguities that make it difficult to understand using
traditional computer programs as compared to data stored in fielded form
in databases or annotated (semantically tagged) in documents.

Dealing with unstructured data

Techniques such as data mining, Natural Language Processing(NLP), text
analytics, and noisy-text analytics provide different methods to find
patterns in, or otherwise interpret, this information. Common techniques
for structuring text usually involve manual tagging with metadata or partof-speech tagging for further text mining-based structuring. Unstructured
Information Management Architecture (UIMA) provides a common
framework for processing this information to extract meaning and create
structured data about the information.

The phrase "unstructured data" usually refers to information that doesn't

reside in a traditional row-column database. As you might expect, it's the
opposite of structured data -- the data stored in fields in a database.
Unstructured data files often include text and multimedia content.
Examples include e-mail messages, word processing documents, videos,
photos, audio files, presentations, webpages and many other kinds of
business documents. Note that while these sorts of files may have an
internal structure, they are still considered "unstructured" because the data
they contain doesn't fit neatly in a database.

Features of unstructured data

Does not reside in traditional databases and data warehouses
May have an internal structure, but does not fit a relational data model
Generated by both humans and machines
Textual and multimedia content
Machine-to-machine communication
Examples include
Personal messaging email, instant messages, tweets, chat
Business documents business reports, presentations, survey
Web content web pages, blogs, wikis, audio files, photos,
Sensor output satellite imagery, geolocation data, scanner

Implementing Unstructured Data Management

Organizations use of variety of different software tools to help them

organize and manage unstructured data. These can include the following:
Big data tools: Software like Hadoop can process stores of both
unstructured and structured data that are extremely large, very complex
and changing rapidly.
Business intelligence software: Also known as BI, this is a broad category
of analytics, data mining, dashboards and reporting tools that help
companies make sense of their structured and unstructured data for the
purpose of making better business decisions.
Data integration tools: These tools combine data from disparate sources so
that they can be viewed or analyzed from a single application. They
sometimes include the capability to unify structured and unstructured data.
Document management systems: Also called "enterprise content
management systems," a DMS can track, store and share unstructured data
that is saved in the form of document files.
Information management solutions: This type of software tracks structured
and unstructured enterprise data throughout its lifecycle.
Search and indexing tools: These tools retrieve information from
unstructured data files such as documents, Web pages and photos.
Unstructured Data Technology
A group called the Organization for the Advancement of Structured
Information Standards (OASIS) has published the Unstructured Information
Management Architecture (UIMA) standard. The UIMA "defines
platform-independent data representations and interfaces for software
components or services called analytics, which analyze unstructured
information and assign semantics to regions of that unstructured
Many industry watchers say that Hadoop has become the de facto industry
standard for managing Big Data. This open source project is mana ged by
the Apache Software Foundation.

Unstructured Data Analysis Unstructured data represents up to 80%

of the data within an organization. You can use InfoSphere Warehouse to extract
structured information out of previously untapped business text. The business value is
immense, e.g., enabling fraud detection and better customer profiling.
InfoSphere Warehouse Unstructured Data Analysis Augments Dynamic Warehouse with
the ability to extract structured information out of previously untapped business text and
correlate with Structured Data to gain business insight.
InfoSphere Warehouse Unstructured Data Analyses Design Studio tooling is targeted
towards the ETL specialist who uses text analysis in the context of a larger data
warehouse project and who is not an expert on text analysis or the UIMA framework. It
contains a basic set of functions to configure and use a fixed set of configurable analysis
engines which are shipped with the product. It also provides function to use (but not
modify) third party analysis engines that are UIMA 1.4.x compliant.