You are on page 1of 10

Data Operations Management

Learning Outcomes
 Define the Data Operation Management.
 Understand the purpose and principles component.
 Know the tools for Data Operation Management.

Time Frame
1 hour

Introduction

The thought for the DataOps concept draws heavily from the source of DevOps,
according to which infrastructure and development teams should work together so
that projects can be managed efficiently. DataOps focuses on multiple subjects within
its field of action, for example, data acquisition and transformation, cleaning, storage,
backup scalability, governance, security, predictive analysis, etc

Analysis
1. What is Data Operation Management?
2. What is the purpose in creating Data Operation?
3. How does the data operation key components help the data management?

Abstraction

What is Data Operation Management?

Data Operations is the combination of people, processes, and products that


enable consistent, automated, and secure data management. It is a delivery system
based on joining and analyzing large databases. Since Collaboration and Teamwork
are the two keys to a successful business and under this idea, the term “DataOps” was
born.

What is the purpose of Data Operation Management?

DataOps’s purpose is to be a cross-functional way of working, in terms of the


acquisition, storage, processing, quality monitoring, execution, betterment, and
delivery of information to the end-user. It harnesses the individuals’ capacities of
working for the common good and business development. Consequently, DataOps
calls for the combination of software operations development teams, which is also as
DevOps.

Besides, the benefits of DataOps extend across the enterprise. For example:

1. Supports the entire software development life cycle and increases DevTest
speed by the fast and consistent supply of environments for the development
and test teams.
2. Improves the quality assurance and through the provision of “production-like
data” that enables the testing to effectively exercise the test cases before
clients encounter errors.
3. Helps organizations to move safely to the cloud by simplifying and speeding
up the process of data migration to the cloud or other destinations.
4. Supports both data science and machine learning. Any organization’s data
science and artificial intelligence endeavors are as good as the information
available. So, DataOps ensures a reliable flow of the data for digestion and
learning as well.
5. Helps with compliance and establishes standardized data security policies and
controls for the smooth flow of data even without risking your clients.

How to Adopt DataOps Principles?

Put all steps to Version Control – There are lots of stages of processing that turn
raw data into useful information for stakeholders. To be valuable, data must progress
through these steps, linked together in some way, with the ultimate goal of producing
a Data-Analytics output.

Branch & Merge – Branching and merging are the main productivity boost for Data
Analytics Team to make any kind of changes to the same source code files. Each team
member control work environment space. Test programs, make changes and take
risks.

Use Multiple Environments – Every Data Analytics team have tools in the laptop for
development. Version Control tools allow working at a private copy of code while
coordinating with other team members. It cannot be productive if don’t have the data
required.

Reuse and Containerize – In DataOps, the analytics team moves so faster like
lightning speed by using highly optimized tools and processes. One of the
Productivity tools is to Reuse and Containerize. Reuse Code means reusing Data
Analytics components. Reuse code saves time also. Container means to run the code
of the application. It a platform like Docker.

Parameterize processing – Parameters allow to code to generalize to operate on a


variety of input and also respond it. Parameters used for the improvement of
productivity. In this, use program to restart at any specific point.

Why does DataOps Matter?

Collaborating throughout the Entire Data Lifecycle – Collaboration is the main


part of both DevOps and DataOps. But DataOps involved in many more desperate
parties instead of Software Development counterpart. That’s why DataOps is the
entire data lifecycle of the organization.

Establishing Data Transparency while maintaining security – DataOps promote


the data locally, team analysis uses computer resources near to data, instead of
moving the data required.

Utilizing Vision Control for Data Scientist Projects – DataOps use this concept on
Data Science. They use this concept when hundreds of Data Scientists work together
or separately on many different projects. When Data Scientist work on their local
machines then data saved locally which slowdowns the productivity. To reduce this,
make a common repository which solves this problem.

Best Practices of DataOps

o Versioning
o Self-service
o Democratize data
o Platform Approach
o Go be open source
o Team makeup and Organisation.
o Unified Platform for all data- historical and Real-Time production.
o Multi-tenancy and Resource Utilisation.
o Access Model and Single Security for governance and self-service access.

Key Components of a DataOps Platform

There are four key software components of a DataOps Platform: data pipeline
orchestration, testing and production quality, deployment automation, and data science
model deployment / sandbox management. Below is our running list of the vendors in
each group.

1. Data Pipeline Orchestration: DataOps needs a directed graph-based


workflow that contains all the data access, integration, model and visualization
steps in the data analytic production process.

 Airflow — an open-source platform to programmatically author, schedule, and


monitor data pipelines.

 DBT (Data Build Tool) — is a command-line tool that enables data analysts
and engineers to transform data in their warehouse more effectively.

2. Automated Testing and Production Quality and Alerts: DataOps


automatically tests and monitors the production quality of all data and artifacts in
the data analytic production process as well as testing the code changes during the
deployment process.

 ICEDQ — software used to automate the testing of ETL/Data Warehouse and


Data Migration.

 Naveego — A simple, cloud-based platform that allows you to deliver


accurate dashboards by taking a bottom-up approach to data quality and exception
management.

3. Deployment Automation and Development Sandbox Creation: DataOps


continuously moves code and configuration continuously from development
environments into production.

 Amaterasu — is a deployment tool for data pipelines. Amaterasu allows


developers to write and easily deploy data pipelines, and clusters manage their
configuration and dependencies.

 Harbr_ — Harbr is a complete solution for your customers, suppliers, partners


and employees to exchange, monetize and collaborate on data and models

4. Data Science Model Deployment: DataOps-driven data science teams make


reproducible development environments and move models into production. Some
have called this ‘MLOps” or “ModelOps

 Domino — accelerates the development and delivery of models with


infrastructure automation, seamless collaboration, and automated reproducibility.

 DataMo-Datmo tools help you seamlessly deploy and manage models in a


scalable, reliable, and cost-optimized way.
Data Security Management

Learning Outcomes
 Define Data Security Management
 Identify security threats and how to manage them.
To know the best practices in data protection
 Understand the use of security tools.
Time Frame
1 hour

Introduction
Data security has become even more complicated with today’s hybrid
environments. Coordinated security management is essential to a range of critical
tasks, including ensuring that each user has exactly the right access to data and
applications, and that no sensitive data is overexposed.

Analysis
1. In your own understanding define Data Security Management.
2. How to protect your data from data threats?
3. What is your way in securing data?

Abstraction
What is Data Security Management?
Data security management involves a variety of techniques, processes and
practices for keeping business data safe and inaccessible by unauthorized parties. Data
security management systems focus on protecting sensitive data, like personal
information or business-critical intellectual property. For example, data security
management can involve creating information security policies, identifying security
risks, and spotting and assessing security threats to IT systems. Another critical
practice is sharing knowledge about data security best practices with employees
across the organization — for example, exercising caution when opening email
attachments.
Data security threats and how to manage them

There are many different threats to data security, and they are constantly evolving, so
no list is authoritative. But here is the most common threats you need to keep an eye
on and teach your users about:

Malware — Malware is malicious software developed to gain unauthorized access or


cause damage. Once malware infects one computer, it can spread quickly through the
network. Malware comes in a variety of forms, like viruses, worms, Trojan horses,
spyware and crimeware. Often malware spreads using its victim’s access rights, so it’s
vital to limit each user’s permissions to only the data and systems they need to do
their job

DDoS attack — Distributed denial of service attacks attempt to make your servers
unusable. To mitigate the risk, consider investing in an intrusion detection system
(IDS) or intrusion prevention system (IPS) that inspects network traffic and logs
potentially malicious activity.

Phishing scams — This common social engineering technique attempts to trick users
into opening malicious attachments in phishing emails. Solutions include establishing
a cybersecurity-centric culture and using a tool to automatically block spam and
phishing messages so users never see them.
Hackers — This is an umbrella term for the actors behind the attacks listed above.
Third parties — Partners and contractors who lack sufficient network security can
leave interconnected systems open to attacks, or they can directly misuse the
permissions they’ve been granted in your IT environment.

Malicious insiders — Some employees steal data or damage systems deliberately, for
example, to use the information to set up a competing business, sell it on the black
market or take revenge on the employer for a real or perceived problem.
Mistakes — Users and admins can also make innocent but costly mistakes, such as
copying files to their personal devices, accidently attaching a file with sensitive
data to an email, or sending confidential information to the wrong recipient.
Data protection best practices

To build a layered defense strategy, it’s critical to understand your cybersecurity risks
and how you intend to reduce them. It’s also important to have a way to measure the
business impact of your efforts, so you can ensure you are making appropriate
security investments.

The following operational and technical best practices can help you mitigate data
security risks:

Operational best practices

 Use compliance requirements as cybersecurity basics. Simply


put, compliance regulations are designed to force companies defend against major
threats and protect sensitive data. Although meeting compliance requirements is not
sufficient for complete data security, it will help you get started on the right path to
risk management and data protection.
 Have a clear cybersecurity policy. Create a policy that clearly explains how
sensitive data is to be handled and the consequences for violating your data protection
Making sure all employees read and understand the policy will reduce the risk that
critical data will be damaged or lost due to human actions.
 Build and test a backup and recovery plan. Companies must prepare for a
range of breach scenarios, from minor data loss to complete data center destruction.
Ensure that critical data is encrypted, backed up and stored offline. Set up roles and
procedures that will speed recovery, and test every part of the plan on a regular
schedule.
 Have a bring-your-own-device (BYOD) policy. Allowing users to access
your network with their personal devices increases the risk of a cybersecurity
Therefore, create processes and rules that balance security concerns against
convenience and productivity. For instance, you can mandate that users keep their
software up to date. Keep in mind that personal devices are harder to track than
corporate devices.
 Provide regular security training. Help your employees identify and avoid
ransomware attacks, phishing scams and other threats to your data and IT resources.
 Make cybersecurity talent retention a priority. Cybersecurity pros are a
scarce commodity today, so take steps to keep the talent you have. Invest in
automated tools that eliminate mundane daily tasks, so they can focus on
implementing strong data security techniques to combat evolving cyber threats.

Technical best practices

Classify data based on its value and sensitivity. Get a comprehensive inventory of all
the data you have, both on premises and in the cloud, and classify it. Like most data
security methods, data classification is best when it’s automated. Instead of relying on
busy employees and error-prone manual processes, look for a solution that will accurately
and reliably classify sensitive data like credit card numbers or medical records.

Conduct regular entitlement reviews. Access to data and systems should be based on


the least-privilege principle. Since user roles, business needs and the IT environment are
constantly changing, work with data owners to review permissions on a regular schedule.

Run vulnerability assessments. Proactively look for security gaps and take steps to
reduce your exposure to attacks.

Enforce a strong password policy. Require users to change their credentials quarterly


and use multifactor authentication. Since administrative credentials are more powerful,
require them to be changed at least monthly. In addition, do not use shared admin
passwords, since that makes it impossible to hold individuals accountable for their
actions.

Basic data security tools


The following data security tools are necessary for data security management

 Firewalls — Firewalls prevent undesirable traffic from entering the network.


Depending on the organization’s firewall policy, the firewall might completely
disallow some traffic or all traffic, or it might perform a verification on some or all of
the traffic.
 Backup and recovery — As noted earlier, you need reliable backup and
recovery in case data is altered or deleted accidentally or deliberately.
 Antivirus software — This provides a critical first line of defense by
detecting and blocking trojans, rootkits and viruses that can steal, modify or damage
your sensitive data.
 IT auditing — Auditing all changes in your systems and attempts to access
critical data enables you proactively spot issues, promptly investigate incidents and
ensure individual accountability

References
The DataOps Enterprise Software Industry, 2020. (2019, February 28). Retrieved from DataKitchen:
https://medium.com/data-ops/the-dataops-enterprise-software-industry-2019-a862904857ef
What is Data Operation (DataOps) ? Principles | Benefits | Adoption | Tools. (2018, November 17).
Retrieved from XENONSTACK: https://www.xenonstack.com/insights/data-operations/

Brooks, R. (2020, February 13). Data Security Management: Where to Start. Retrieved from
NETWRIX: https://blog.netwrix.com/2020/02/13/data-security-management-where-to-start/

You might also like