You are on page 1of 22

Data Preparation

Data preparation is the process of


gathering, combining, structuring and
organizing data so it can be used in
business intelligence (BI), analytics
and data visualization applications. It also
can be known as data prep or data
wrangling. The components of data
preparation include data pre-processing,
profiling, cleansing, validation and
transformation; it often also involves pulling
together data from different internal systems
and external sources.

Data preparation work is done by


information technology (IT), BI and data
management teams as they integrate data
sets to load into a data warehouse, NoSQL
database or data lake repository or when
new analytics applications are developed.
Purposes of data preparation
One of the primary purposes of data
preparation is to ensure that raw data being
readied for data processing and analysis is
accurate and consistent so the results of BI
and analytics applications will be valid. Data
is commonly created with missing
values, inaccuracies or other errors.
Additionally, separate data sets often have
different formats that need to be reconciled.
Correcting data errors, verifying data
quality and joining data sets constitutes a
big part of the data preparation process.

Data preparation also involves finding


relevant data to include in analytics
applications to ensure they deliver the
information that analysts or business users
are seeking.

In addition, BI and data management teams


can use the data preparation process
to curate data sets for business users to
analyze. Doing so helps streamline and
guide self-service BI applications for
business analysts, executives and workers.
Steps in the data preparation
process
The process of preparing data includes
several distinct steps. They are..

Ÿ Data collection. Relevant data is gathered


from operational systems, data warehouses
and other data sources .

Ÿ Data discovery and profiling is to explore


the collected data to better understand what
it contains and what needs to be done to
prepare and helps identify patterns,
inconsistencies, anomalies, missing data
etc. for the intended uses. 

Ÿ Data cleansing. In this step, the identified


data errors are corrected to create complete
and accurate data sets that are ready to be
processed and analyzed.

Ÿ Data structuring. At this point, the data


needs to be structured, modeled and
organized into a unified format
Ÿ Data transformation and enrichment. 
transformation is to make data consistent
and turn it into usable information. Data
enrichment and optimization further
enhance data sets

Ÿ Data validation and publishing. To


complete the preparation process,
automated routines are run against the data
to validate its consistency, completeness
and accuracy. The prepared data is then
stored in a data warehouse or other
repository and made available for use.

Benefits of data preparation


Ÿ users can spend less time finding and
structuring data and instead focus more
on data mining and data analysis -- the BI-
related activities that deliver business value.
Ÿ ensure that the data used for BI, machine
learning, predictive analytics and other
analytics applications has sufficient quality
levels to produce reliable results;

Ÿ avoid duplication of efforts in preparing data


that can be used in multiple applications;
Ÿ prepare data for analysis in a cost-effective
and efficient way;

Ÿ identify and fix data issues that otherwise


might not be detected;

Ÿ make more informed business decisions


because executives have access to better
data; and

Ÿ get more business value and a higher return


on investment (ROI) from its BI and
analytics initiatives.

Data Security

Why is data security


important?
Data security is the practice of protecting
digital information from unauthorized
access, corruption, or theft throughout its
entire lifecycle.
When properly implemented, robust data
security strategies will protect an
organization’s information assets against
cybercriminal activities, but they also guard
against insider threats and human error,
which remains among the leading causes of
data breaches today. Data security involves
deploying tools and technologies that
enhance the organization’s visibility into
where its critical data resides and how it is
used.

Types of data security


Encryption

Using an algorithm to transform normal text


characters into an unreadable format,
encryption keys scramble data so that only
authorized users can read it. File and
database encryption solutions serve as a
final line of defense for sensitive volumes by
obscuring their contents through encryption
or tokenization.

Data Erasure

More secure than standard data wiping,


data erasure uses software to completely
overwrite data
on any storage device. It verifies that the
data is unrecoverable.

 Data Masking

By masking data, organizations can allow


teams to develop applications or train
people using real data. It masks personally
identifiable information (PII) where
necessary so that development can occur in
environments that are compliant.

Data Resiliency

Resiliency is determined by how well a data


center is able to endure or recover any type
of failure – from hardware problems to
power shortages and other disruptive
events.

Data security capabilities and


solutions
Data security tools and technologies should
address the growing challenges inherent in
securing today’s complex, distributed,
hybrid, and/or multicloud computing
environments. These include understanding
where data resides, keeping track of
who has access to it, and blocking high-risk
activities and potentially dangerous file
movements.

Data discovery and classification tools

Sensitive information can reside in


structured and unstructured data
repositories including databases, data
warehouses, big data platforms, and cloud
environments. Data discovery and
classification solutions automate the
process of identifying sensitive information,
as well as assessing and remediating
vulnerabilities.

 Data and file activity monitoring

File activity monitoring tools analyze data


usage patterns, enabling security teams to
see who is accessing data, spot anomalies,
and identify risks. Dynamic blocking and
alerting can also be implemented for
abnormal activity patterns.

 Vulnerability assessment and risk


analysis tools

These solutions ease the process of


detecting and mitigating vulnerabilities such
as out-of-date software, misconfigurations,
or weak passwords, and can also identify
data sources at greatest risk
of exposure.

 Automated compliance reporting

Comprehensive data protection solutions


with automated reporting capabilities can
provide a centralized repository for
enterprise-wide compliance audit trails.

Data security strategies


A comprehensive data security strategy
incorporates people, processes, and
technologies. Establishing appropriate
controls and policies is as much a question
of organizational culture as it is of deploying
the right tool set. This means making
information security a priority across all
areas of the enterprise.

Physical security of servers and user


devices

Regardless of whether your data is stored


on-premises, in a corporate data center, or
in the public cloud, you need to ensure that
facilities are secured
against intruders and have adequate fire
suppression measures and climate controls
in place. A cloud provider will assume
responsibility for these protective measures
on your behalf.

 Access management and controls

The principle of “least-privilege access”


should be followed throughout your entire IT
environment. This means granting
database, network, and administrative
account access to as few people as
possible, and only those who absolutely
need it to get their jobs done.

Application security and patching

All software should be updated to the latest


version as soon as possible after patches or
new versions are released.

 Backups
Maintaining usable, thoroughly tested
backup copies of all critical data is a core
component of any robust data security
strategy. In addition, all backups should be
subject to the same physical and logical
security controls that govern access to the
primary databases and core systems.

 Employee education

Training employees in the importance of


good security practices and password
hygiene and teaching them to recognize
social engineering attacks transforms them
into a “human firewall” that can play a
critical role in safeguarding your data.

 Network and endpoint security


monitoring and controls

Implementing a comprehensive suite of


threat management, detection, and
response tools and platforms across your
on-premises environment and cloud
platforms can mitigate risks and reduce the
probability of a breach.

Data security trends


AI

AI amplifies the ability of a data security


system because it can process large
amounts of data. Cognitive Computing, a
subset of AI, performs the same tasks as
other AI systems but it does so by
simulating human thought processes. In
data security, this allows for rapid decision-
making in times of critical need.

 Multicloud security

The definition of data security has


expanded as cloud capabilities grow. Now
organizations need more complex solutions
as they seek protection for not only data,
but applications and proprietary business
processes that run across public and private
clouds.

Quantum

A revolutionary technology, quantum


promises to upend many traditional
technologies exponentially. Encryption
algorithms will become much more faceted,
increasingly complex and much more
secure.

You might also like