You are on page 1of 21

Federated

Learning for
AI Analytics

CoSP Executive Handbook What does your business need to make the most of 5G?
IoT has changed AI for good
Edge, cloud computing, and central data centers are all essential to compete in
today’s high-tech world and the Internet of Things (IoT). Living on the edge or in
the cloud might be precarious at first, but the advantages outweigh the risks by
assisting in everything from traffic enforcement to medical research. None of this
can be efficiently accomplished without robust artificial intelligence (AI) and an
intuitive learning model.

Centralized learning models are risky. Traditional AI approaches present


performance and security issues for many businesses. Data stored on a central AI/ML helps with smart
server can lead to user privacy violations and the unauthorized release of data. This applications like:
is a huge liability when corporate governance must comply with regulations such
• Face recognition
as Sarbanes-Oxley or HIPAA and GDPR. It can also be catastrophic. If an adversary
compromises an algorithm in a medical or military AI ecosystem, the results could
• Speech recognition
lead to a loss of life. Lost data can also mean lost revenue to companies. • Handwriting transcription
• Medical diagnosis
The remote, hybridized nature of today’s high-tech world mandates secure, smart,
efficient, and accurate predictive analytics through AI. Machine learning (ML), • Autonomous driving
inherent to AI, is an essential part of the process. ML processes high volumes of • Digital assistants
data with automation and predictive analysis that no human could accomplish. It
• Banking
must build vigorous statistical models accurately from massive data sets, turning
big data into smart data to improve outcomes by identifying problems and • Stock trades
patterns. When done correctly, machine learning allows IoT to work flawlessly • Gaming
both individually and collectively.

AI Analytics IoT has changed AI for good 2


Centralized learning is
no longer sustainable
Centralized learning has been the traditional norm in AI modeling. With
the distributed nature of technology and devices today, it is quickly
losing favor—because it is inefficient and not secure. It cannot be scaled
effectively nor can it safely process the massive volumes of data from
different sources that federated learning can.

To be trusted and effective, edge, cloud, and data center computing must
have the AI modeling and analytics that guarantee data privacy, security,
and accuracy. All algorithms used to process data must have embedded
security protocols to keep data safe and data owners from being
compromised. This is where distributed federated learning excels.

Federated learning was first introduced by the Google1 think tank as the
future for AI analytics. In 2016, Google debuted TensorFlow Federated
(TFF).2 This was a user-friendly implementation of federated learning.
Today’s business is hybrid, with remote and on-premises computing
demanded by users in many business sectors, which means AI must be
collaborative. Federated learning makes edge computing a viable way
for businesses to stay ahead of the curve and the competition. It does it
securely without exposing user data.

A report by Berryville Institute of Machine Learning 3 identifies 70 risks


associated with machine learning systems. Data sets that train a machine
learning system account for 60 percent of these risks, while source codes
account for 40 percent of those risks. Federated learning offers access to
far more data points while preserving data integrity and privacy. It brings
machine learning models down to the user-data level.

AI Analytics Centralized learning is no longer sustainable 3


The differences
between centralized
and federated learning
Federated learning is a decentralized distributed learning system that relies
on remote execution. Instead of being centralized on a server, it distributes
copies of a machine learning algorithm to various sites or devices (nodes)
where the data is stored. Conversely, the centralized paradigm delivers
machine learning solutions on cloud-based APIs with software deployed
on remote servers of AI providers. Centralized learning identifies a
problem, prepares the data, trains the machine learning algorithm on a
centralized server, and then sends the trained model to the client system,
exposing the API. Federated learning has all training iterations performed
on local devices. It is device centric, so it does not compromise or expose
original data. It returns the computation or analytics to the central server,
which then updates the main algorithm of the learning model.

Robust encryption inherent in federated learning will be especially


important in the coming years, with more distributed cloud applications.
Gartner researchers report in their Top Strategic Technology Trends
for 20214 that half of large organizations surveyed will implement
privacy-enhancing technologies (PET) for processing data in untrusted
environments by 2025. PET rollouts will be prioritized in areas where there
is data monetization, fraud analytics, and transfers of highly sensitive data.

AI Analytics The differences between centralized and federated learning 4


Federated learning architecture

Hospital A Hospital B Hospital C


Private and Local Private and Local Private and Local
secure data AI model secure data AI model secure data AI model

2 1
1 1

2
2

KEY: 1 Local model sharing

2 Global model sharing updates


Federated workflow
Instead of data moving to a central place,
machine learning models move to the data for training,
then recombine to create a global model.

AI Analytics The differences between centralized and federated learning 5


Federated learning offers:

Retention of sovereignty Federated learning that


Data remains with the owner without impacting is secured through PET A study conducted
the training of algorithms on the data. PETs such as homomorphic encryption (HE) help
by NCBI5 found
ensure security. This is especially important in the
medical and financial sectors that must comply
federated learning
with regulatory requirements. HE maintains the used by 10 medical
Data that can be leveraged integrity of encrypted operations like searches institutions resulted in
without being shared and analytics without exposing the operation models achieving 99
When data privacy must be preserved, the itself or the resulting data. It never reveals percent of the model
federated model effectively utilizes available data personal data to servers. quality achieved by
anonymously. Model training can be distributed centralized data.
among data owners and results aggregated
This was because of
without compromising privacy.
federated learning
ability to evaluate
Collaboration
Data sets from many sources across a wide
generalizability on data
Flexible topology geography can be shared. Complex patterns can gains from institutions
Model sharing or aggregation among the nodes be accurately identified across large and diverse outside the federation.
can be done later or combined. groups of data sets with better results.

AI Analytics The differences between centralized and federated learning 6


The next wave of AI:
Performance and
security
“US companies pay
Although AI is widely embraced, it is often not widely understood. A 2019
McKinsey report, Confronting the Risks of Artificial Intelligence,6 says few
an average of USD
business and government leaders have honed their knowledge on the full 8.64 million per data
scope of risks. Many businesses do not fully understand how data is fed breach, including
into AI systems to operate algorithmic models or how humans interact the cost of higher
with machines. That can be costly. customer turnover
and lost business
A report by Intel on data security7 says US companies pay an average of
USD 8.64 million per data breach, so it is essential to maintain the integrity
due to downtime.”
of data to stay in compliance with regulatory and legal standards.

AI Analytics The next wave of AI: Performance and security 7


The case for
federated learning
The distributed, decentralized approach of federated learning can benefit a
variety of different business sectors and applications. Here are two growth
areas where this distributed approach can assist:

Financial services – According to cybersecurityguide.org,8 the global


financial services market totaled USD 22 trillion in 2019. This makes it
a very lucrative area for cybercriminals. Much of this growth was seen
in noncash payments through internet and mobile devices with users
demanding immediate payment schemas for real-time payments. As
mobile device access has increased, so has the industry’s attack vector
with new vulnerabilities. Securing financial data is mission critical and
nonnegotiable. Cyber hackers are very sophisticated in their attacks so it is
imperative the industry can build robust security models from billions of
transaction patterns across multiple institutions worldwide.

Centralized AI analytics platforms are vulnerable to exposing customer


information during data breaches, privacy issues, and money laundering.
As more mobile devices are used, more vulnerabilities are exposed.
According to the IBM Security Cost of a Data Breach report,9 the average
cost per breach within financial services was USD 5.86 million in 2019.
From 2009 and 2019, American Express and SunTrust Bank were
breached five times while Capital One and Discover were hit four times.
Those incursions resulted in pressure from both customers and regulatory
agencies to heighten cybersecurity preparedness, protection of customer
data, and predictive analytics.

AI Analytics The case for federated learning 8


Healthcare – A Frost & Sullivan report predicts the AI market
in healthcare will increase 40%, to USD 6.6 billion in 2021. AI
AI medical test models
applications in healthcare may save up to USD 150 billion annually
currently used include:
by 2026, with the potential ability to reduce the cost of medical
treatment by 50 percent.10 Lower costs are not the only benefit. • Emotional intelligence
According to a report in the National Institutes of Health,11 machine indicators to detect subtle
learning models used in federated learning can lower mortality rates cues in a person’s mood
for patients with COVID-19 by avoiding locally aggregating clinical and feelings
data across multiple facilities. Federated learning showed promise
• Help in tuberculosis
in COVID-19 electronic health records (EHRs) in developing robust
detection
predictive models without compromising privacy.
• Treatment of PTSD
Research, clinical trials, and diagnostics are critical to healthcare.
Collaboration between research facilities is essential to identify • AI chatbots
new treatment modalities. In clinical settings, patient data and
• Virtual assistants for
archives comprise thousands of records and images. Due to privacy
patients and clinicians
requirements, data sources are siloed and cannot be used without
patient permission. This restricts data access, limiting medical • Insurance verification
professionals in diagnosis, treatment, and patient outcomes. Digital
• Smart robots explaining
medical data, which reached 44 zettabytes in 2020, is expected to
lab reports
double every following year. This requires efficient, scalable, secure AI.
Federated learning helps circumvent this problem by keeping patient • Clinical documentation
data confidential and assists in collaborative learning for medical
studies and predictive analysis at remote locations or facilities.

AI Analytics The case for federated learning 9


Federated learning architecture
is more secure with Intel® SGX
SGX stands for Software Guard Extensions. These are security SGX is not required for federated learning. It makes
instruction codes that run on Intel® CPUs. Intel® SGX is a hardware- it easier and more secure. Intel SGX adds another
based trusted execution environments (TEEs) that helps protect layer of defense by reducing the attack surface.
against code and data snooping and code and data modifications by
malware on the system. It minimizes the trusted computing base to • This protects code and data from attack
reduce the surface from where an attack can be launched. Advantages by malicious software.
and results include:
• It protects privileged escalations while data
Advantages and results include: is being processed so developers can create
trusted execution environments.
• Protection against software attacks

• Prevention of attacks against memory content • The risk of side-channel attacks by hackers
is also reduced. SGX helps isolate code and
• Option for hardware-based attestation
data from outside incursions.

Utilizing a federated learning architecture secured by technologies • Intel’s founding role in the Confidential
such as Intel SGX preserves owner data and enables training of Computing Consortium helps to quickly
algorithms on data with a flexible topology. Online availability is identify and mitigate areas where attacks can
continuous since training is done offline with results returned later. occur. Security vulnerabilities are regularly
Federated learning applications are becoming the most widely used updated.
and accepted privacy preservation technique in industry and medical
AI applications.12

AI Analytics The case for federated learning 10


Advantages of Intel® SGX

Protection against software attacks


01
Incorporation of Intel® SGX13 helps protect against software
attacks even if OS/driver/BIOS/VMM/SMM are compromised.
This increases protections for secrets even when an attacker
has full control of the platform.

Helps prevents attacks against memory content


02
Intel SGX helps prevent attacks, including memory bus
snooping, memory tampering, and cold boot attacks against
memory contents in RAM. This can reduce the risk that data
in memory is tampered with or stolen.

Option for hardware-based attestation


03 Intel SGX offers an opportunity for hardware-based
attestation capabilities to measure and verify valid code
and data signatures. These mechanisms increase the
confidence level across the participants in the AML/CFT
system about the integrity of the model and data.

Learn more about the Consilient/Intel project.

AI Analytics The case for federated learning 11


Additional AI security tips
To stay compliant and competitive, every organization should employ these essential data security strategies:

Data encryption ‒ Use algorithms to encode User authentication and authorization ‒


data in an unreadable format that requires an The most-secure user authentication includes
authorized key for decryption. Remember that biometrics, built-in two-factor authentication,
cryptographic processing is vulnerable to side- and secure enclave technology built into the
channel attacks. Use the latest technologies to processor itself.
speed encryption and boost security without
impacting performance.

Hardware-based security ‒ Protect data at Data backup – Create an exact copy of data
every layer of the IT infrastructure, not just and store it in a secure location where it can
the software. Intel® hardware-enabled security only be accessed by authorized administrators.
capabilities include protections built right Protect the backup and maintain a documented
into the silicon, creating trusted infrastructure backup policy.
helps to secure hardware, firmware, operating
systems, applications, networks, and the cloud.

AI Analytics The case for federated learning 12


Intel medical use case
A recent U.S. Food and Drug Administration artificial intelligence and
machine learning discussion paper14 reports AI and machine learning–
“This real-world feedback
based technologies “have the potential to transform healthcare by
and performance
deriving new and important insights from the vast amount of data
generated during the delivery of healthcare.” Officials also said AI offers
adaptation makes these
the benefits of earlier disease detection, more-accurate diagnosis, technologies uniquely
identification of new observations or patterns on human physiology, situated among software
and development of personalized diagnostics and therapeutics. FDA as a medical device
officials said one of the greatest benefits of AI machine learning is its (SaMD). Our vision is that
ability to learn from real-world use and experience and improve its with appropriately tailored
performance: “This real-world feedback and performance adaptation
regulatory oversight, AI
makes these technologies uniquely situated among software as a
machine learning–based
medical device (SaMD). Our vision is that with appropriately tailored
regulatory oversight, AI machine learning‒based SaMD will deliver safe
SaMD will deliver safe
and effective software functionality that improves the quality of care and effective software
that patients receive.” functionality that improves
the quality of care that
Medical devices with embedded AI capabilities are already being
patients receive.”
certified at the University of California San Francisco’s (UCSF) Center for
Digital Health Innovation (CDHI), with the help of Intel.15 UCSF is using
Intel® Software Guard Extensions (Intel® SGX) featured in the Intel®
Xeon® E processor family and Fortanix Confidential Computing Enclave
Manager to streamline the process. Intel SGX helps protect the privacy
of patient data in the BeeKeeperAI project.

AI Analytics Intel medical use case 13


Intel® SGX helps UCSF
Intel® SGX enables the AI platform The platform provides a zero-trust
to create a trusted computing environment designed to protect
environment that offers hardware- both the intellectual property of
based memory encryption to isolate an algorithm and the privacy of
specific application code and data in healthcare data, while CDHI’s
memory. This means the BeeKeeperAI proprietary BeeKeeperAI provides
project can use these private regions the workflows to enable more-efficient
of memory, called enclaves (or TEEs), data access, transformation,
to increase the security of application and orchestration.
code and data (to run signed
applications in enclaves).

Obtaining regulatory approval for clinical AI algorithms requires a varied set of diverse and detailed clinical
data that develops, validates, and optimizes unbiased algorithm models. These algorithms should be able to
consistently perform across wide-ranging patient populations, socioeconomic groups, and geographic locations.
They also need to be equipment agnostic. Because of these complicated parameters and limited data access,
few research groups or healthcare organizations have access to the high-quality data needed to accomplish this.
Federated learning expands their reach.

AI Analytics Intel medical use case 14


Intel use case –
Healthcare medical
imaging
Intel and the University of Pennsylvania (UPenn) are training artificial
intelligence models to facilitate the early detection of brain tumors while
still maintaining privacy.16 The Perelman School of Medicine at Penn has
partnered with Intel Labs to codevelop the technology based on federated
learning. The alliance enables a federation of 29 international healthcare
and research institutions to train AI models to identify brain tumors that
train algorithms across multiple devices without compromising data
“AI shows great promise samples. It is trained on the largest brain tumor data set to date without

for the early detection of sensitive data leaving individual collaborators. The project is a three-
year, USD 1.2 million grant awarded to the Center for Biomedical Image
brain tumors, but it will
Computing and Analytics (CBICA) at UPenn.
require more data than
any single medical center Research and healthcare institutions from the US, the UK, Germany, the
holds to reach its full Netherlands, Switzerland, and India are participating in the study, which
potential.” uses a distributed learning approach to enable them to collaborate on

—Jason Martin, principal engineer


deep learning projects without sharing patient data. Penn Medicine and
at Intel Labs Intel Labs were the first to publish a paper on federated learning in the
medical imaging domain. They demonstrated that federated learning could
train a model to over 99 percent of the accuracy of a model trained in the
traditional, nonprivate method. The new work at Penn will leverage Intel®
software and hardware to implement federated learning that provides
additional privacy protection for both the model and the data.

AI Analytics Intel use case – Healthcare medical imaging 15


Intel use case – Financial sector
A twenty-first century solution to combat money laundering

Money laundering is a nagging problem for financial institutions, with over 95 percent of anti‒money laundering
(AML) alerts offering false positives. Illicit actors profit by laundering trillions of dollars annually, despite massive
efforts to track and stop financial crime. The problem is very complicated:

Financial institutions have Concerns over data privacy Regulatory pressures instituted Compliance costs to financial
information-sharing constraints on a global basis have further by the Currency and Foreign institutions are over a hundred
because they work within compounded these barriers, Transactions Reporting Act of times greater than recovered
an existing AML/CF financial with no way to facilitate 1970 (the Bank Secrecy Act) and criminal funds, and banks,
governance system that interbank information sharing. expanded regulations in Title III taxpayers, and depositors are
operates as islands. This of the USA Patriot Act demand penalized more than criminals
means they work in isolation to financial institutions understand who concoct successful
identify and report a suspicious and manage their crime risk. laundering schemes.
customer or transaction to
a financial intelligence unit.
This type of system does not
encourage information sharing,
collective learning, or dynamic
feedback among enterprises.

AI Analytics Intel use case ‒ Financial sector 16


Federated learning fights financial fraud
Consilient’s federated machine learning technology backed by Intel® SGX is fighting financial fraud and money
laundering13 to tackle money laundering by moving beyond traditional rules-based transaction monitoring to
real-time sharing and collective learning. Intel launched the pilot project with Consilient in 2020 to redesign
how financial institutions and authorities discover and prevent financial crime by lowering false positive rates while
still protecting sensitive customer information. The new model for the AML/CF system provides a more effective
and efficient means of stopping financial crime. The federated learning approach has some big advantages:

• It goes beyond traditional rules-based monitoring to one that facilitates information


sharing among various authorities and institutions.

• It enables collective learning on complex threats.

• It distributes and shares risk.

• It simultaneously safeguards customer privacy and data.

AI Analytics Intel use case ‒ Financial sector 17


Federated learning architecture shares
insight on financial crime
The model below uses a federated learning architecture with DOZER technology from Consilient and Intel® SGX technology to
share insights into financial crime risks in a utility-like fashion. At scale, this new model helps to securely and effectively discover
systemically relevant financial crime risk across institutions and borders. It can also reduce the burden of false positives and
dependence on rules-based models and protect privacy and security by moving the analytics and not the data.

BANK 5 BANK 1 BANK 2

Do Dozer/Intel SGX
ze
r/ GX
Int telS
el®
SG r / In
ze
X Do

Alg
o1 Algo 1.1 1 +∆
.4
= Algo
Alg =
.1
o1
.3 + lgo1
∆ A

Dozer/Intel SGX Algo 1.3 = Algo 1.2 +∆ Dozer/Intel SGX Algo 1.2 = Algo 1.1 +∆ Dozer/Intel SGX

Algo factory

BANK 4 BANK 3

AI Analytics Intel use case ‒ Financial sector 18


Federated learning
heralds a new era
for AI analytics
AI analytics has forever changed with federated learning at the helm.
This decentralized distributed system relies on remote execution as it
distributes copies of machine learning algorithms to sites or devices while
preserving data.

• Training iterations are done locally while computation


results are sent to the central server, which updates the
main algorithm.

• This maintains data integrity at the source.

Federated learning with Intel® SGX guarantees a level of security and flexibility
in AI analytics that centralized learning will never provide. Key advantages
include retention of sovereignty, flexible topology, collaboration, data privacy
retention, and security through homomorphic encryption. As our world
changes and technology advances, federated learning will be the logical choice.

Many companies and organizations have been hesitant to jump on the AI


bandwagon, waiting for more “capable” technologies. The problem is when
it comes to AI, time is not on your side. Those who delay will fall behind. It is
essential to be an early adopter to remain competitive.

What are you waiting for? Good things come to those who are first in line.
Find more information at intel.com/ai.

AI Analytics Federated learning heralds a new era for AI analytics 19


AI everywhere computing
requires smarter solutions
Visit intel.com/ai to learn why scalable AI applications start with
Intel-optimized AI software.
References
1. https://research.googleblog.com/2017/04/federated-learning-collaborative.html
2. TensorFlow Federated.
3. https://berryvilleiml.com/docs/ara.pdf
4. https://www.gartner.com/en/newsroom/press-releases/2020-10-19-gartner-identifies-the-top-strategic-technology-trends-for-2021
5. https://pubmed.ncbi.nlm.nih.gov/32724046/
6. https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/confronting-the-risks-of-artificial-intelligence
7. https://www.intel.com/content/www/us/en/analytics/data-security.html
8. https://cybersecurityguide.org/industries/financial/
9. https://www.ibm.com/account/reg/us-en/signup?formid=urx-42215
10. https://software.intel.com/content/www/us/en/develop/articles/artificial-intelligence-and-healthcare-data.html?wapkw=AI%20risks
11. Vaid A., Jaladanki S.K., Xu J., Teng S., Kumar A., Lee S., Somani S., Paranjpe I., De Freitas J.K., Wanyan T., Johnson K.W., Bicak M., Klang E., Kwon Y.J., Costa A., Zhao S., Miotto R., Charney A.W., Böttinger E.,
Fayad Z.A., Nadkarni G.N., Wang F., Glicksberg B.S. “Federated Learning of Electronic Health Records Improves Mortality Prediction in Patients Hospitalized with COVID-19.” medRxiv [Preprint]. 2020 Aug
14:2020.08.11.20172809. doi: 10.1101/2020.08.11.20172809. Update in: JMIR Med Inform. 2020 Dec 14; PMID: 32817979; PMCID: PMC7430624.
12. https://arxiv.org/abs/1610.05492
13. https://www.intel.com/content/www/us/en/financial-services-it/federated-learning-solution.html
14. https://www.fda.gov/files/medical%20devices/published/US-FDA-Artificial-Intelligence-and-Machine-Learning-Discussion-Paper.pdf
15. https://www.intel.com/content/www/us/en/newsroom/news/ucsf-propel-medical-device-innovations.html#gs.25v2g8
16. https://newsroom.intel.com/news/intel-works-university-pennsylvania-using-privacy-preserving-ai-identify-brain-tumors/#gs.5rq2yc

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
US/07/2021/PDF/JH/CMD

You might also like