Professional Documents
Culture Documents
Structured
Structured
Unstructured
Data
By Christine Taylor, Posted March 28, 2018
Structured data is far easier for Big Data programs to digest, while the
myriad formats of unstructured data creates a greater challenge. Yet
both types of data play a key role in effective data analysis.
SHARE
Download the authoritative guide: Big Data 2019: Mining Data for Revenue
Structured data vs. unstructured data: structured data is comprised of clearly defined
data types whose pattern makes them easily searchable; while unstructured data –
“everything else” – is comprised of data that is usually not as easily searchable,
including formats like audio, video, and social media postings.
Unstructured data vs. structured data does not denote any real conflict between the
two. Customers select one or the other not based on their data structure, but on the
applications that use them: relational databases for structured, and most any other type
of application for unstructured data.
If you're looking for big data solutions for your enterprise, refer to our list of the top big
data companies
However, there is a growing tension between the ease of analysis on structured data
versus more challenging analysis on unstructured data. Structured data analytics is a
mature process and technology. Unstructured data analytics is a nascent industry with
a lot of new investment into R&D, but is not a mature technology. The structured data
vs. unstructured data issue within corporations is deciding if they should invest in
analytics for unstructured data, and if it is possible to aggregate the two into better
business intelligence.
Email: Email has some internal structure thanks to its metadata, and we
sometimes refer to it as semi-structured. However, its message field is unstructured
and traditional analytics tools cannot parse it.
Scientific data: Oil and gas exploration, space exploration, seismic imagery,
atmospheric data.
Users can run simple content searches across textual unstructured data. But its lack of
orderly internal structure defeats the purpose of traditional data mining tools, and the
enterprise gets little value from potentially valuable data sources like rich media,
network or weblogs, customer interactions, and social media data. Even though
unstructured data analytics tools are in the marketplace, no one vendor or toolset are
clear winners. And many customers are reluctant to invest in analytics tools with
uncertain development roadmaps.
On top of this, there is simply much more unstructured data than structured.
Unstructured data makes up 80% and more of enterprise data, and is growing at the rate
of 55% and 65% per year. And without the tools to analyze this massive data,
organizations are leaving vast amounts of valuable data on the business intelligence
table.
Structured data is traditionally easier for Big Data applications to digest, yet today's data
analytics solutions are making great strides in this area.
These databases are common in big data infrastructure and real-time Web applications
like LinkedIn. On LinkedIn, hundreds of millions of business users freely share job titles,
locations, skills, and more; and LinkedIn captures the massive data in a semi-structured
format. When job seeking users create a search, LinkedIn matches the query to its
massive semi-structured data stores, cross-references data to hiring trends, and shares
the resulting recommendations with job seekers. The same process operates with sales
and marketing queries in premium LinkedIn services like Salesforce. Amazon also
bases its reader recommendations on semi-structured databases.
A few years ago, analysts using keywords and key phrases could search unstructured
data and get a decent idea of what the data involved. eDiscovery was (and is) a prime
example of this approach. However, unstructured data has grown so dramatically that
users need to employ analytics that not only work at compute speeds, but also
automatically learn from their activity and user decisions. Natural Language Processing
(NLP), pattern sensing and classification, and text-mining algorithms are all common
examples, as are document relevance analytics, sentiment analysis, and filter-driven
Web harvesting. Unstructured data analytics with machine-learning intelligence allows
organizations to:
In eDiscovery, data scientists use keywords to search unstructured data and get a
reasonble idea of the data involved.
No matter what your business specifics are, today’s goal is to tap business value
whether the data is structured or unstructured. Both types of data potentially hold a
great deal of value, and newer tools can aggregate, query, analyze, and leverage all data
types for deep business insight across the universe of corporate data.
Next steps: to fully understand the enterprise IT infrastructure that hosts today's
structured and unstructured Big Data tools, read The Comprehensive Guide to Cloud
Computing.