You are on page 1of 1

Artificial Intelligence and Data Analytics (AIDA) Guidebook

corresponding computation on unencrypted data. Differential privacy is a technique to remove


some data from a dataset so that the identity of individuals cannot be determined, even when
combined with other datasets.13 Federated computation, such as federated learning14 or shared
analytics15, allows machine learning or data analytics, respectively, to be performed remotely
over decentralized data without transmitting that data. Google developed federated learning to
update predictive typing models for Android keyboards by learning from millions of users by
tuning the models on the users’ devices without exposing the users’ data.16 Others have
experimented with using federated learning across multiple organizations, such as healthcare
systems. Synthetic data is an approach to take sensitive or private data and produce mock data
that has similar statistical characteristics to the original data.17 This is helpful for algorithm
developers to test and improve the performance of their algorithms, typically, before being
used on the actual private data in a more difficult to use secure enclave.

Data Transparency
Today, firms across the world are incorporating AI-based and data analytics systems to
automate their business processes and enable their workforce to focus on customer and
operational needs. Data Transparency is paramount to make data, analysis, methods, and
interpretive choices underlying researcher claims visible in a way that allows others to evaluate
them. However, the incorporation of AI technology is impeded by several challenges, and the
biggest one is the lack of data transparency. Often you cannot manipulate or control a system if
you do not know what goes into it. AI systems are often considered as a black box, which
means regulators are able to see what goes into the AI and what comes out but are unable to
see how the algorithms and technology actually works, and this makes it challenging to pinpoint
logical errors in the underlying algorithms. Tech companies are hesitant to share their
algorithms because it can lead to potential intellectual property infringement and theft. These
challenges in data transparency are often at odds with efforts to enforce data privacy.
Sometimes protected data is not transparent data, and in these cases, cybersecurity can
introduce challenges in efforts to expand, control, and monitor AI as it is applied. In this
instance, having greater security might present barriers to having more visible, transparent AI,
making it imperative to develop explainable AI. It is easy to see how this can be a problem
when faced with an application of this technology for government or in a public domain.

13
https://www.vanderbilt.edu/jetlaw/
14
https://arxiv.org/abs/1511.03575
15
https://sagroups.ieee.org/2795/
16
See https://ai.googleblog.com/2017/04/federated-learning-collaborative.html for how Google uses Federated
Learning on Android devices
17
https://ieeexplore.ieee.org/document/7796926

Page 20

You might also like