You are on page 1of 25

AI for Managers – S.

Prof. Soumen Manna


Data Collection:
• Before you can start to curate the data for your machine learning system, you must first identify
and acquire it. Data can not only come from your own organization, but it can also be licensed
from a third-party data collection agency or consumer service, or created from scratch. In fact, it
is not uncommon for an AI system to depend on data from all of these sources.
Data Governance

• Data being the cornerstone of artificial intelligence, it is important to


understand the ethical and legal ramifications of obtaining and using
any data.

• Governance is a term that has been applied to a number of areas of


technologies. Governance is about ensuring that processes follow the
highest standards of ethics while following legal provisions in spirit, as
well as to the letter of the law.
Data Governance
Data Collection Policies:
• Users should be made aware of what data is being collected and for how long it will be
stored.

• Data collection should be “opt-in” rather than “opt-out,” or in other words, data
collection should not be turned on by default and no data should be collected without
specific user approval.

• If data will be sent to third parties for processing/storage, the user should also be
informed of this upfront.

• A smart data collection policy goes a long way in establishing goodwill and customer
satisfaction.
Data Governance
Encryption
• Encrypting sensitive information such as credit card information has
become a standard industry practice.

• Encryption alone could be the sole factor between a bankrupting


event and simply a public relations issue.

• It is obviously necessary to protect the keys and the passwords used


to encrypt the data as well. If the keys are exposed, they should be
revoked and passwords should be changed for good measure.
Data Governance
Access Control Systems
• All data should be classified based on an assessment of factors such as its
importance to the user and the company and whether it contains personal data
of users or company secrets. For example, the data could be classified as “public,”
“internal,” “restricted,” or “top secret.”

• Based on the classification assigned to the data, appropriate security measures


should be established and followed.

• Access to data must be controlled, and only approved users should be granted
access.
Data Governance
Anonymizing the Data
• If the data needs to be sent to a third parties or even other less secure internal
groups, all potentially identifying information, such as names, addresses,
telephone numbers, and IP addresses, should be scrubbed from the data.

• If a unique number is allotted to an individual, it should be randomized and reset


as well.

• No data should be shared with third parties without sufficient consent being
obtained from the users who are featured in the data being shared.
Creating a Data Governance Board
• In order to develop the initial data governance policies, a data
governance board can be constituted.

• The board will develop the organization's data governance policies by


looking at best practices across the globe like General Data Protection
Regulation (GDPR), Health Insurance Portability and Accountability
Act (HIPAA) provisions, and so forth.

• The board should be formed with people who can drive these big
decisions. The necessity may arise for the board to push through
difficult decisions that are at odds with the aims of the organization in
order to protect the rights of the people whose data is at risk.
Initiating Data Governance
• It is always easier to start with an existing set of data governance
rules and then adapt the rules to fit your organization.

• Your data governance board will aid in making key decisions for which
policies may not yet have been established, setting precedents, and
then instating newer policies as the organization evolves and grows to
handle more data.

• Such a process will help to ensure that the costs of governing the data
do not exceed the benefits derived from it.
HIPAA
• The Health Insurance Portability and Accountability Act (HIPAA) dictates the
procedures to be followed and the safeguards to be adopted regarding medical
data. If you are dealing with medical data, it is critical to be compliant with these
laws and regulations from the get-go.

• In 2013, the Health Information Technology for Economic and Clinical Health Act
(HITECH Act) was also implemented. The HITECH Act makes it mandatory to
report breaches that affect 500 or more people to the U.S. Department of Health
and Human Services, the media, and the persons affected. Only authorized
entities are allowed to access patients' medical data.

• With this in mind, an organization should be careful that the data being sourced is
not violating any provisions of HIPAA or HITECH and that it is ethically and legally
sourced.
GDPR
• In 2018, the European Union introduced a new set of privacy policies called the
General Data Privacy Regulation (GDPR). These privacy policies put the user as
the data owner, irrespective of whether data is stored. Under GDPR, data
collection must be explicit, and any implicit consent—such as “fine print” stating
that signing up for an account implies that your data can automatically be
collected—is in contravention of GDPR.

• GDPR also mandates that requests for deletion of user data should be as simple
as the form for consent. GDPR mandates that users be made aware of their rights
under the policy, as well as how their data is processed, what data is being
collected, and how long will it be retained, among other things. GDPR is a step in
the right direction for user privacy, aimed to protect the users from data
harvesters and unethical data collection.
GDPR
• The responsibility and accountability have been put squarely on the shoulders of
the data collectors under GDPR. The data controller is responsible for the
nondisclosure of data to unauthorized third parties. The data controller is
required to report any breaches of privacy to the supervisory authority; however,
notifying users is not a mandatory requirement if the data was disclosed in an
encrypted format.

• Although these regulations are only legally applicable to users in the European
Union, adopting GDPR policies for users across the globe will put your
organization at the forefront for compliance and data governance practices.
Data Responsible AI-oriented organization

• Securing your data should be considered a critical mission rather than


an afterthought.

• For an AI-oriented organization, data is the cornerstone of all research


activities. Good data will lead to better decisions.

• Data governance might seem like a daunting task, but with the help of
a solid plan, it can be managed just like everything else.
Pitfall 1: Insufficient Data Licensing

• When it comes to data, having sufficient licensing is critical.

• Using unlicensed data for your use case is the quickest way to derail a
system just as it is about to launch.

• In the worst case, you find the issue when the data owners bring legal
action against your organization.

• To prevent this, it is imperative to have a final audit (or even better,


periodic audits) to review all the data being used to build the system.

• This audit should also include validation of third-party code packages,


because this is another area where licensing tends to be ignored for the
sake of exploration.
Pitfall 2: Not Having Representative Ground Truth
• Every machine learning system has to train using data.

• A selected data will serve as the system's ground truth.

• It is important that your ground truth contains the necessary


knowledge to answer all related questions.

• However there is no uniform technique available that a train data


contain all the information of population.
Pitfall 3: Insufficient Data Security

• For information to be useful, it has to satisfy three major conditions:


confidentiality, integrity, and accessibility.

• In order to ensure legal, ethical, and cost-effective compliance,


security should not be an afterthought, especially for your data
storage systems.

• Data stores should be carefully designed from the start of the project.
Data leakage can lead to major trust issues among your customers
and can prove to be very costly.
Pitfall 3: Insufficient Data Security
• Customer data should be stored only in an encrypted format. This will ensure that even if
the entire database is leaked, the data will be meaningless to the hackers.

• This will ensure that even if the entire database is leaked, the data will be meaningless to
the hackers.

• It should be confirmed that the encryption method that is selected has sufficient key
strength and is used as an industry standard, like RSA (Rivest–Shamir–Adleman) or
Advanced Encryption Standard (AES).

• The key size should be sufficiently long to avoid brute-force attempts.

• The keys should not be stored in the same location as the data store. Otherwise, you
could have the most advanced encryption in the world and it would still be useless.
Pitfall 3: Insufficient Data Security

• Employees also need to be trained in security best practices.

• It is important to include not only employees but also any contract


resources that you are using to ensure that they are trained in the
best security practices.

• Training and hardening your organization's managers, engineers, and


other resources, just like your software, is the best way to avoid
security compromises.
Pitfall 3: Insufficient Data Security
• Computer security is a race between hackers and security researchers.

• Auditing your infrastructure and servers by professional penetration testers will


go a long way in achieving your organization's security goals.

• These specialists think like hackers and use the same tools that are used by
hackers to try to break into your system and give you precise recommendations
to improve your security.

• Although getting security right on the first attempt might not be possible, it is
nonetheless necessary to take the first steps and consider security from the
beginning of the design phase.
Pitfall 4: Ignoring User Privacy
• Dark designs are design choices that trick the user into giving away their privacy.
These designs work in such a way that a user might have given consent for their
data to be analyzed/stored without the user understanding what they have
consented to.

• Dark design should be avoided on an ethical and, depending on your jurisdiction,


legal basis. As the world progresses into an AI era, more data than ever is being
collected and stored, and it is in the interest of everyone involved that the users
understand the purposes for which their consent is being recorded.

• A quick way to judge whether your design choices are ethical is to check whether
answering “no” to data collection and analysis imposes a penalty on the user
beyond the results of analysis.
Pitfall 4: Ignoring User Privacy

• If third-party vendors are used for data analysis, it becomes imperative to ensure
that anonymization of the data has taken place. This is to lessen the likelihood
that the third party will misuse the data.

• With third-party vendors, it becomes necessary to take further measures like


row-level security, tokenization, and similar strategies. Conducting software
checks to ensure that the terms of a contract are upheld is very important if third
parties are going to be allowed to collect data on your behalf.

• Cambridge Analytica abused its terms of service as Facebook merely relied on the
good nature and assumed integrity of Cambridge Analytica's practices. It could
have been avoided if Facebook took proper security measure ahead.
Pitfall 5: Backups

• Although most people today understand the importance of backups,


what they often fail to do is implement correct backup procedures.

• At a minimum, a good backup plan should involve the following steps:


backing up the data (raw data, analyzed data, etc.), storing the backup
safely, and routinely testing backup restorations.

• This last step is frequently missed and leads to problems when the
system actually breaks. Untested backups fail to recover lost data or
produce errors and require a lot of time to restore, thus costing the
organization time and money to fix the problems.
Pitfall 5: Backups
• With cloud storage becoming so commonplace, it is essential to remember that the
cloud is “just another person's computer” and it can go down, too.

• Although cloud solutions are typically more stable than a homegrown solution because
they are able to rely on the economies of scale and the intelligence of industry experts,
they can still have issues.

• Relying only on cloud backups may make your life easier in the short term, but it is a bad
long-term strategy. Cloud providers could turn off their systems. They could have
downtime when you need to do that critical data recovery procedure.

• It is therefore necessary to implement off-site and on-site physical storage media


backups. These physical backups should also be regularly tested and the hardware
regularly upgraded to ensure that everything will work smoothly in the case of a disaster.
Pitfall 5: Backups

• All data backups should be encrypted as well.

• This is especially important to prevent a rogue employee from directly


copying the physical media or grabbing it to take home.

• With encrypted backups, you will have peace of mind and your
customers will sleep soundly, knowing their data is safe.
Questions?

You might also like