You are on page 1of 3

Structured Vs Unstructured Data

Data can be designated as structured or unstructured data for classification within an organization. What is Structured Data? The term structured data refers to data that is identifiable because it is organized in a structure. The most common form of structured data -- or structured data records (SDR) -- is a database where specific information is stored based on a methodology of columns and rows. Structured data is also searchable by data type within content. Structured data is understood by computers and is also efficiently organized for human readers. In contrast, unstructured data has no identifiable structure. What is Unstructured Data? The term unstructured data refers to any data that has no identifiable structure. For example, images, videos, email, documents and text are all considered to be unstructured data within a dataset. While each individual document may contain its own specific structure or formatting that based on the software program used to create the data, unstructured data may also be considered loosely structured data because the data sources do have a structure but all data within a dataset will not contain the same structure. This is in contrast to a database, for example, which is a common example of "structured" data. Structured Vs Unstructured Data As many are aware, twenty-first century corporations are facing a crisis. Many corporations have been accurately and comprehensively storing data for years. The problem is, they are illequipped to do anything with this massive collection of data. Big data is so comprehensive that it has become unwieldy and the average corporation is unable to retrieve or organize this data in any useful way. The fundamental difference between structured data and unstructured data, as you might expect, is that structured data is organized in a highly mechanized and manageable way. Structured data is ready for seamless integration into a database or well structured file format. Unstructured data, by contrast, is raw and unorganized. Digging through unstructured data can be cumbersome and costly. In an ideal world, a company's goal would be to turn all of the big data they've amassed into structured data. However, the cost and time associated with this is unfeasible. Of course; if it was possible or feasible to instantly transform unstructured data to structured data, then creating intelligence from unstructured data would be easy. However, structured data is akin to machinelanguage, in that it makes information much easier to deal with using computers; whereas unstructured data is (loosely speaking) usually for humans, who dont easily interact with information in strict, database format.

Example of Unstructured Data Email is an example of unstructured data since Email is almost unstructurable. It's indexed by date, time, sender, recipient, and subject, but the body of an email remains unstructured. While the busy inbox might be arranged by date, time or size; if it were truly fully structured, it would also be arranged by exact subject and content, with no deviation or spread which is impractical, because people dont generally speak about precisely one subject even in focused emails. Other examples of unstructured data include books, documents, medical records, and social media posts. Example of Structured Data Spreadsheets would be considered structured data, which can be quickly scanned for information because it is properly arranged in a relational database system.

The challenge of unstructured data is one of volume. Importantly, this is not to say that structured data remains entirely unproblematic. By its very nature, structured data needs to remain relatively simplistic and uncomplicated. A data point can only be called structured if it is simple, categorized, and entirely finite, which might suggest to readers that unstructured data is definitely more interesting (and therefore worth archiving) data. People also use structured data every day. Structured data is anything that has an enforced composition to the atomic data types. Structured data is managed by technology that allows for querying and reporting against predetermined data types and understood relationships. People use unstructured data every day. Although they may not be aware, they use it for creating, storing and retrieving reports, e-mails, spreadsheets and other types of documents. Unstructured data consists of any data stored in an unstructured format at an atomic level. That is, in the unstructured content, there is no conceptual definition and no data type definition - in textual documents, a word is simply a word. Some current technologies used for content searches on unstructured data require tagging entities such as names or applying keywords and meta tags. Therefore, human intervention is required to help make the unstructured data machine readable. Two Categories of Unstructured Data Unstructured data consists of two basic categories: Bitmap Objects: Inherently non-language based, such as image, video or audio files. Textual Objects: Based on a written or printed language, such as Microsoft Word documents, emails or Microsoft Excel spreadsheets. Both of these object types may be classified as data, but the technology and methodology for harnessing relevant information from bitmap objects is still in its infancy. Most of today's technology addresses textual objects. Enterprise content management (ECM) technologies, for example, can help contain unstructured data. Textual data mining and analysis vendors provide analysis tools for unstructured textual objects, and business intelligence vendors supply solutions

for querying and analyzing structured data. However, bringing them together - querying both the unstructured and structured worlds - and then associating these two worlds at relevant points is where the most value is gained and also where the highest level of challenge is presented.

Comparing these categories with structured data raises three distinct challenges: Even if unstructured data is in a format such as a Microsoft Word template, the data is still not consumable from a semantic level without a compatible interface or application. Even with a compatible technology, we cannot necessarily gain insight into the context of the information unless we can actually read it. And lastly, the way we interpret what we read is largely subjective.

Why do we care about the Mountains of Unstructured Data? IDC (International Data Corporation) estimates the volume of digital data will grow 40% to 50% per year. By 2020, IDC predicts the number will have reached 40,000 EB, or 40 Zettabytes (ZB). The worlds information is doubling every two years. By 2020 the world will generate 50 times the amount of information and 75 times the number of information containers. The massive growth of unstructured or semi-structured data is amazing and has implications for data warehouse / business intelligence / data analytics architecture and database design. The way we capture, store, analyze, and distribute data is transforming. New technologies like deduplication, compression, and analysis tools are lowering costs. Structured data gives names to each field in a database and defines the relationships between the fields. Unstructured data is usually not stored in a relational database (as traditionally defined) where the data model is relevant to the meaning of the data. The Internet of Things (equipping all objects in the world with identifying devices), blogs, videos, social media, emails, notes from call centers, and all forms of human and computer to computer communications will soon start to produce massive amounts of unstructured or semistructured data. The trick is to create value by extracting the right information from both internal and external data sources. That is what the science of data and art of business analytics needs to learn to extract from larger and larger sets of unstructured data.