You are on page 1of 5

Raw data reprot





By

 Gavin Wright

What is raw data?

Raw data (sometimes called source data, atomic data or primary data) is data that has not
been processed for use. A distinction is sometimes made between data and information to the
effect that information is the end product of data processing. Raw data that has undergone
processing is sometimes referred to as cooked data.

Although raw data has the potential to become "information," it requires selective extraction,
organization and sometimes analysis and formatting for presentation. Because of processing,
raw data sometimes ends up in a database, which enables the data to become accessible for
further processing and analysis in a number of different ways.

How raw data works

Tremendous amounts of raw data surround us and are produced every day. The human brain
is incredibly good at taking in raw data, processing it and using it to make decisions.

For example, imagine you are trying to cross a busy road. The eyes capture raw data as
flashes of light and dark. Then the brain takes these flashes and resolves them into objects
such as street signs and cars. The working memory can tell you if that car is sitting still,
getting bigger as it comes toward you, or getting smaller as it drives away. Meanwhile, the
ears take in raw information in the form of vibrations in the air, which the brain translates
into sounds that can be interpreted as the wind, voices or a car engine. Finally, all this
processed data that came in through the eyes, ears and memory helps you make the informed
decision to cross the street or not.
Computers cannot intuitively process raw data like a human mind can, however, and raw data
is generally not useful on its own. Extra processing is required to turn it into useful
information. Additionally, the final data from one system may be used as raw data in another.

For example, imagine a simple home thermostat. Its raw data source is a temperature probe --
usually read as an analog voltage level. The system takes this voltage level as raw data and
turns it into a temperature reading. It can then use this processed data to meet a
predetermined desired temperature for turning on and off a heater or air conditioner.

Furthermore, the system may feed this temperature reading and the current time into another
climate control system as that system's raw data. Then the data is stored and analyzed over
time to produce a predictive modeling algorithm to help make better heating and cooling
decisions.

Usually, organizations must process raw data for it to become information when putting it in a
repository to become useful. One notable exception is the data lake, which is a storage repository
that can hold massive volumes of raw data in its native format.

How to process raw data

Many sources can produce raw data. How it is processed and stored depend on its source and
intended use, though. Examples of raw data can be financial transactions from a point of sale
(POS) terminal, computer logs or even participant eye tracking data in a research project.
Applications and devices can save raw data in various formats, but the most common format
for interchanging raw data between systems is as a comma separated values (CSV) file.

In many instances, users must clean raw data before it can be used. Cleaning raw data may
require parsing the data for easier ingestion into a computer, removing outliers or spurious
results and, occasionally, reformatting or translating the data -- a process sometimes called
massaging or crunching the data.

There are many ways to process raw data, ranging from simple to complex. A spreadsheet
such as Microsoft Excel or Google sheets allows users to format, organize and graph data to
reveal simple trends and help summarize data. More complicated systems such as business
intelligence (BI) programs may use raw data for financial trending or forecasting purposes.
Advanced systems may use raw data for alerting purposes or with machine learning to build
models of the data and its behavior.

Value of raw data

The primary value in data is after it has been processed and interpreted. There is generally not
much value in holding onto raw data without a way to use it, but as the cost of storage
decreases, organizations are finding more and more value in collecting raw data for additional
processing -- if not right away, then later.

Raw data may contain personally identifiable information (PII). This may make an
organization liable for storing or transmitting it. Therefore, it may use data anonymization to
remove PII from the raw data or data controls and implement data retention policies to limit
the risk of data leaks.

Organizations can feed raw data into a database or a data warehouse (one of several kinds of
data repositories -- see image above), which can collect raw data from many sources for
automatic or manual correlating and processing. An analysist can then query the data using
BI tools to produce useful information from the data.

Many large businesses today recognize the value of raw data. Consumer data is a hot
commodity that they can buy and sell to build profiles of users or target a specific audience,
for example. Businesses can also store operational and logging data for use in performance
metrics and to streamline business practices, while they can use access logs and the like to
identify computer breaches and track what data may have been accessed by hackers.

Also see data lake, big data, big data analytics and data governance.

This was last updated in May 2021

Continue Reading About raw data (source data or atomic data)

 Understanding object storage vs. block storage for the cloud

 Understanding and comparing six types of data processing systems

 Combining AI and predictive analytics crucial for the enterprise

 Collaborative analytics benefits enterprise data analysis

 Top 7 predictive analytics use cases in enterprises

Related Terms
data mesh

Data mesh is a decentralized data management architecture for analytics and data science.
See complete definition
DataOps

DataOps is an Agile approach to designing, implementing and maintaining a distributed data


architecture that will support a wide ... See complete definition

SOC 3 (System and Organization Controls 3)

A System and Organization Controls 3 (SOC 3) report outlines information related to a


service organization's internal controls ... See complete definition

Dig Deeper on Data governance

Arm processor

By: Robert Sheldon

IoT basics: A guide for beginners

By: Ben Lutkevich


smart sensor

By: Brien Posey

data preprocessing

By: George Lawton

You might also like