You are on page 1of 76

Practical and Robust 

Privacy Preserving
Machine
Cygogn
Learning
By Dr. Rania Talbi 

NLP Workshop hosted by Wikit, TeamWork, and Cygogn

February 22nd2022, Lyon, France.


Outline

 Context and Problem Statement

 Privacy in Distributed Machine Learning:

PrivML: Practical Privacy Preserving Machine Learning

 Robustness in Federated Learning:

ARMOR: Mitigating Poisoning Attacks in Federated Learning

 Conclusion and Perspectives


Context &
Problem
Statement
Privacy issues in Distributed ML

𝐷 𝑂1 Data Owner
Global ML Service
model Provider
𝐷 𝑂5

𝐷 𝑂2 𝑀𝐿𝑆𝑃 𝐷 𝑂4

Local
training
data
𝐷 𝑂3
4
Cryptography-based vs. Non-cryptographic
PPML

Privacy
Privacy

Cliquez pour ajouter du texte


Utility Runtime
Utility Runtime
Cryptography- Non-cryptographic
based techniques techniques
(HE-based, and (Data & Output
MPC-based PPML Perturbation, Data
methods)  Anonymization)
5
Cryptography-based vs. Non-cryptographic
PPML

Privacy
Privacy

Utility Runtime
Utility Runtime
Cryptography- Non-cryptographic
based techniques techniques
(HE-based, and (Data & Output
MPC-based PPML Perturbation, Data
methods) Anonymization)
6
Privacy in Distributed
Machine Learning
PrivML:
Practical Privacy Preserving Machine
Learning
Background on Homomorphic Encryption

PHE SWHE FHE

Increasing
performance
overhead
8
PrivML’s Design Objectives

Propose
mechanisms to
reduce the
overhead of HE

Make sure that


PPML model Ensure end-to-
utility is not end privacy
degraded preservation

9
Overview of PrivML’s Architecture

Learning phase

ML Service
𝐷𝑂1
Provider
(MLSP)
𝐷𝑂2

.......
𝐷𝑂 𝑁
PrivML
10
Overview of PrivML’s Architecture

Learning phase

KMU

ML Service
𝐷𝑂1
Provider
(MLSP)
𝐷𝑂2
Encrypted training
data via multiple keys

𝐷𝑂 𝑁
PrivML
11
Overview of PrivML’s Architecture

Learning phase

KMU

ML Service
𝐷𝑂1
Provider
(MLSP)

MU
𝐷𝑂2
Encrypted training
HE-based data via multiple keys
computation  
protocols SU 𝐷𝑂 𝑁
PrivML
12
Overview of PrivML’s Architecture

Learning phase

KMU

ML Service
𝐷𝑂1
Provider
(MLSP)

MU
𝐷𝑂2
Encrypted training
Encrypted global data via multiple keys
model trained over the  
joint data SU 𝐷𝑂 𝑁
PrivML
13
Overview of PrivML’s Architecture

Prediction phase Learning phase

KMU

ML Service
𝐷𝑂1
Provider
Encrypted (MLSP)

Querier
Classification Query
MU
𝐷𝑂2
Encrypted training
data via multiple keys
Encrypted  
Classification
Response
SU 𝐷𝑂 𝑁
PrivML
14
Threat Model

Prediction phase Learning phase

KMU

ML Service
𝐷𝑂1
Provider
Encrypted (MLSP)

Querier
Classification Query
MU
𝐷𝑂2
Encrypted training
data via multiple keys
Encrypted  
Classification
Response
SU 𝐷𝑂 𝑁
PrivML
 The Key Management Unit is trusted
15
Threat Model

Prediction phase Learning phase

KMU

ML Service
𝐷𝑂1
Provider
Encrypted (MLSP)

Querier
Classification Query
MU
𝐷𝑂2
Encrypted training
data via multiple keys
Encrypted  
Classification
Response
SU 𝐷𝑂 𝑁
PrivML
 All the other parties are honest but curious
16
Threat Model

Prediction phase Learning phase

KMU

ML Service
𝐷𝑂1
Provider
Encrypted (MLSP)

Querier
Classification Query
MU
𝐷𝑂2
Encrypted training
data via multiple keys
Encrypted  
Classification
Response
SU 𝐷𝑂 𝑁
PrivML
 Data owners and queriers are mutually untrusted
17
Threat Model

Prediction phase Learning phase

KMU

ML Service
𝐷𝑂1
Provider
Encrypted (MLSP)

Querier
Classification Query
MU
𝐷𝑂2
Encrypted training
data via multiple keys
Encrypted
Classification
Response
SU 𝐷𝑂 𝑁
PrivML
 Computation units are non-colluding 18
Cryptographic Primitives Underlaying PrivML

(1) Homomorphic
Addition
Distributed
Two-Trapdoor
+
Public-Key
Cryptosystem
(DT-PKC)
[Liu 2016] [ 𝑥 ] 𝑝𝑘𝜆
(2) Homomorphic Scalar
Multiplication

19
Cryptographic Primitives Underlaying PrivML

(1) Homomorphic
Addition
Distributed
Two-Trapdoor
+
Public-Key
Cryptosystem
(DT-PKC)
[Liu 2016] [ 𝑥 ] 𝑝𝑘𝜆
(2) Homomorphic Scalar
Multiplication

20
Cryptographic Primitives Underlaying PrivML

(1) Homomorphic
Addition
Distributed
Two-Trapdoor
+
Public-Key
Cryptosystem
(DT-PKC)
[Liu 2016] [ 𝑥 ] 𝑝𝑘𝜆
(2) Homomorphic Scalar
Multiplication

21
Cryptographic Primitives Underlaying PrivML

Distributed
Two-Trapdoor
Public-Key
Cryptosystem 𝑆𝐾
(DT-PKC)
[Liu 2016]
 

22
Cryptographic Primitives Underlaying PrivML

ML Service
Distributed Provider
Two-Trapdoor 𝑃𝑆𝑑𝑒 𝑐 1 (𝑆 𝐾 1 , ..) (MLSP)
Public-Key
Cryptosystem 𝑆𝐾
(DT-PKC) [𝐶𝑇]
[Liu 2016]
 

23
Cryptographic Primitives Underlaying PrivML

ML Service
Distributed Provider
Two-Trapdoor 𝑃𝑆𝑑𝑒 𝑐 1 (𝑆 𝐾 1 , ..) (MLSP)
Public-Key
Cryptosystem 𝑆𝐾
(DT-PKC) [𝐶𝑇]
[Liu 2016]
 
𝑃𝑆𝑑𝑒𝑐 2 (𝑆 𝐾 2 ,..)

24
Cryptographic Primitives Underlaying PrivML

ML Service
Distributed Provider
Two-Trapdoor 𝑃𝑆𝑑𝑒 𝑐 1 (𝑆 𝐾 1 , ..) (MLSP)
Public-Key
Cryptosystem 𝑆𝐾 MU
(DT-PKC) [𝐶𝑇]
[Liu 2016]
 
𝑃𝑆𝑑𝑒𝑐 2 (𝑆 𝐾 2 ,..)
SU

25
Outsourced Privacy Preserving Computations in
PrivML

@MU

Privacy
Preserving
Computation
Protocols in @SU
PrivML
 

@MU

26
Outsourced Privacy Preserving Computations in
PrivML

Operand blinding with


random values
homomorphically @MU
𝑃𝑆𝑑𝑒 𝑐 1 (𝑆 𝐾 1 , ..)
Privacy
Preserving
Computation
Protocols in @SU
PrivML
 

@MU

27
Outsourced Privacy Preserving Computations in
PrivML

Operand blinding with


random values
homomorphically @MU
𝑃𝑆𝑑𝑒 𝑐 1 (𝑆 𝐾 1 , ..)
Privacy
Preserving 𝑃𝑆𝑑𝑒 𝑐 2 (𝑆 𝐾 2 ,..)
Computation
Computations over
Protocols in blinded operands @SU
PrivML
𝐸𝑛𝑐 (𝑝𝑘𝑤 , ..)  

@MU

28
Outsourced Privacy Preserving Computations in
PrivML

Operand blinding with


random values
homomorphically @MU
𝑃𝑆𝑑𝑒 𝑐 1 (𝑆 𝐾 1 , ..)
Privacy
Preserving 𝑃𝑆𝑑𝑒 𝑐 2 (𝑆 𝐾 2 ,..)
Computation
Computations over
Protocols in blinded operands @SU
PrivML
𝐸𝑛𝑐 (𝑝𝑘𝑤 , ..)  

Removing random blinding


values from output @MU
homomorphically

29
PPML Design process in PrivML

ML
Algorithm

30
PPML Design process in PrivML

ML
Algorithm

𝑥. 𝑦 …

√𝑥 ¿ 𝑋 , 𝑌 >¿

Elementary
Operations
31
PPML Design process in PrivML

ML
Algorithm

HE-based Privacy
𝑥. 𝑦 … Preserving Protocols
Design

√𝑥 ¿ 𝑋 , 𝑌 >¿

Elementary
Operations
32
PPML Design process in PrivML

ML
Algorithm

HE-based Privacy
𝑥. 𝑦 … Preserving Protocols 𝑥. 𝑦 …
Design

√𝑥 ¿ 𝑋 , 𝑌 >¿ √𝑥 ¿ 𝑋 , 𝑌 >¿

Elementary
Operations Privacy Preserving
Elementary Operations 33
PPML Design process in PrivML

ML
PPML
Algorithm
Algorithm

HE-based Privacy
𝑥. 𝑦 … Preserving Protocols 𝑥. 𝑦 …
Design

√𝑥 ¿ 𝑋 , 𝑌 >¿ √𝑥 ¿ 𝑋 , 𝑌 >¿

Elementary
Operations Privacy Preserving
Elementary Operations 34
PPML Design process in PrivML

ML
PPML
Algorithm Close Algorithm
or identical
output

HE-based Privacy
𝑥. 𝑦 … Preserving Protocols 𝑥. 𝑦 …
Design

√𝑥 ¿ 𝑋 , 𝑌 >¿ √𝑥 ¿ 𝑋 , 𝑌 >¿

Elementary
Operations Privacy Preserving
Elementary Operations 35
Threat Model
Overhead Reduction Strategies in PrivML

Round Complexity
Minimization

Pre-computations of Random
Powers

Optimized Large Number


Arithmetic

Parallel Computing Pre-computation of


values that are not
dependent on
Analytical Approximations online operands

Ciphertext Packing
37
Overhead Reduction Strategies in PrivML

Round Complexity ▪ Pre-computation of a significant amount


Minimization of random blinding values & random
encryption powers without impacting
Pre-computations of Random
Powers security level.
▪ Intuition: Product of random powers is a
Optimized Large Number
Arithmetic random power

Parallel Computing Pre-computation of


values that are not
dependent on
Analytical Approximations online operands

Christine Jost, Ha Lam, Alexander Maximov, and Ben JM Smeets. Encryption


Ciphertext Packing performance improvements of the paillier cryptosystem. IACR Cryptol. ePrint
Arch., 2015:864, 2015.
38
Overhead Reduction Strategies in PrivML

Round Complexity ▪ Pre-computation of a significant amount


Minimization of random blinding values & random
encryption powers without impacting
Pre-computations of Random
Powers security level.
▪ Intuition: Product of random powers is a
Optimized Large Number
Arithmetic random power
▪ According to [Jost 2015], we pre-compute
Parallel Computing of random powers and multiply of them Pre-computation of
values that are not
for 80-bit level security dependent on
Analytical Approximations ▪ Tradeoff between storage and pre- online operands

computations
Christine Jost, Ha Lam, Alexander Maximov, and Ben JM Smeets. Encryption
Ciphertext Packing performance improvements of the paillier cryptosystem. IACR Cryptol. ePrint
Arch., 2015:864, 2015.
39
Overhead Reduction Strategies in PrivML

Round Complexity
Minimization

Pre-computations of Random
Powers ▪ Use of Schönhage and Strassen FFT multiplication [Gaudry
Optimized Large Number 2007] to implement DT-PKC cryptosystem primitives
Arithmetic ▪ We use an assembly-based sub-routine provided in The GNU
Multiple Precision Arithmetic Library [Granlund 2012].
Parallel Computing
• Pierrick Gaudry, Alexander Kruppa, and Paul Zimmermann. A gmp-based implementation
of schönhage-strassen’s large integer multiplication algorithm. In Proceedings of
international symposium on Symbolic and algebraic computation, pages 167–174, 2007.
Analytical Approximations • Torbjörn Granlund and the GMP development team. GNU MP: The GNU Multiple Precision
Arithmetic Library, 5.0.5 edition, 2012. http://gmplib.org/.

Ciphertext Packing
40
Privacy in Distributed
Machine Learning
PrivML:

III. Experimental Evaluation of PrivML


Implementation & Experimental Setup

Implementation & Evaluation Scenarios Evaluation Datasets

C++ library available at: ▪ Real World Datasets from UCI are
https://gitlab.liris.cnrs.fr/rtalbi/privml used : Adult, Bank, Nursery, Iris &
Evaluation Scenarios: Edinburgh

Performance of PrivML’s ▪ For all datasets a subset of 2000


PPML methods
records is extracted (except for Iris &
Edinburgh)
Performance of PrivML’s
Privacy Preserving ▪ 80% - 20% Split between training and
Building Blocks
testing data
Performance of PrivML’s
Cryptographic primitives Dheeru Dua and Casey Graff. UCI machine learning
repository, https://archive.ics.uci.edu/ml, 2017.
42
Performance of PrivML’s Cryptographic
Primitives

43
Performance of PrivML’s Cryptographic
building blocks

44
Performance of PrivML’s PPML methods

45
Comparison with Related-Works

46
Comparison with Related-Works

47
Robustness in Federated
Learning
I. Background & Related Work
on Federated Learning (FL)
& Attacks targeting FL
Generalities on Federated Learning

 FL ensures privacy by design


FL server (Data never leaves clients
devices)

Global model

Workers

49
Generalities on Federated Learning

 FL ensures privacy by design


(Data never leaves clients
devices)

Global model  FL is massively distributed


 Data is distributed in a non-IID
way
 Unbalanced distribution of
training data between workers

Workers

50
Generalities on Federated Learning

(1) The global  FL ensures privacy by design


model is sent to
(Data never leaves clients
selected
workers devices)

Global model  FL is massively distributed


 Data is distributed in a non-IID
way
 Unbalanced distribution of
training data between workers

Workers

51
Generalities on Federated Learning

(2) Workers  FL ensures privacy by design


carry local
(Data never leaves clients
training and
send model devices)
updates to the
 FL is massively distributed
FL server
 Data is distributed in a non-IID
Model       way
updates
 Unbalanced distribution of
training data between workers

Workers

52
Generalities on Federated Learning

(3) The FL  FL ensures privacy by design


 
server
(Data never leaves clients
aggregates
model updates Model aggregation
devices)
and sends the
 FL is massively distributed
new model to
workers  Data is distributed in a non-IID
Model       way
updates
 Unbalanced distribution of
training data between workers

Workers

53
Generalities on Federated Learning

 FL ensures privacy by design


 
(Data never leaves clients
Secure aggregation
devices)
 FL is massively distributed
 Data is distributed in a non-IID
      way
 Unbalanced distribution of
training data between workers
 Privacy is improved via Secure

Workers Aggregation [Bonawitz 2017]

54
Targeted Data Poisoning attacks in FL

Clean training
data for traffic
signs
classification

55
Targeted Data Poisoning attacks in FL

Mislabel to
Stop signs

Clean training
data for traffic
signs
classification

56
Targeted Data Poisoning attacks in FL

Mislabel to
Stop signs

Carry local
Clean training training and
data for traffic generate
signs a faulty model
classification update

57
Targeted Data Poisoning attacks in FL

Mislabel to
Stop signs

Carry local
Clean training training and
data for traffic generate
signs a faulty model Send the
classification update faulty model
update to the
FL server for
aggregation
58
Targeted Data Poisoning attacks in FL

Repeat until the


backdoor is
injected within
the model
Mislabel to
Stop signs

Carry local
Clean training training and
data for traffic generate
signs a faulty model Send the
classification update faulty model
update to the
FL server for
aggregation
59
Targeted Data Poisoning attacks in FL

Repeat until the


backdoor is
injected within
the model
Mislabel to
Stop signs

Carry local
Clean training training and
data for traffic generate
signs a faulty model Send the
classification update faulty model
update to the
FL server for
Single shot model poisoning: [ Bagdasaryan 2019] aggregation
60
Robustness in Federated
Learning
ARMOR:

II. Design Principles of ARMOR


Overview of ARMOR

 Use the FL model to


generate a class
representatives' dataset

62
Overview of ARMOR

 Use the FL model to


generate a class
representatives' dataset

63
Overview of ARMOR

 Use the FL model to


generate a class
representatives' dataset
 GAN-based approach for
synthetic data generation
(ARgan)

64
Overview of ARMOR

 Use the FL model to


generate a class
representatives' dataset
 GAN-based approach for
synthetic data generation
(ARgan)
 Use the class
representatives to monitor
the FL model evolution
through multiple rounds
(MORpheus)

65
ARMOR’s Components: (1) ARgan

66
ARMOR’s Components: (1) ARgan

67
ARMOR’s Components: (2) MORpheus

68
ARMOR’s Components: (2) MORpheus

69
ARMOR’s Components: (2) MORpheus

70
ARMOR’s Components: (2) MORpheus

71
Experimental Results

Robustness Evaluation

(1) Data Poisoning (2) Model Poisoning

72
Experimental Results

Robustness & Utility tradeoff Evaluation

Mitigation success rate (%) Mitigation success rate (%)


(1) Data Poisoning (2) Model Poisoning

73
Bibliography
Bibliography

 Rakesh Agrawal and Ramakrishnan Srikant. Privacy-preserving data mining. In Proceedings of


the 2000 ACM SIGMOD international conference on Management of data, volume 29, pages
439–450. ACM, 2000.
 Payman Mohassel and Yupeng Zhang. Secureml: A system for scalable privacypreserving
machine learning. In 2017 IEEE Symposium on Security and Privacy (S&P), pages 19–38.
IEEE, 2017.
 Andrey Kim, Yongsoo Song, Miran Kim, Keewoo Lee, and Jung Hee Cheon. Logistic
regression model training based on the approximate homomorphic encryption. BMC medical
genomics, 11(4):83, 2018.
 Raphael Bost, Raluca Ada Popa, Stephen Tu, and Shafi Goldwasser. Machine learning
classification over encrypted data. In NDSS, volume 4324, page 4325, 2015.
 Sangwook Kim, Masahiro Omori, Takuya Hayashi, Toshiaki Omori, Lihua Wang, and Seiichi
Ozawa. Privacy-Preserving Naive Bayes Classification Using Fully Homomorphic Encryption.
In International Conference on Neural Information Processing, pages 349–358. Springer, 2018.

75
Bibliography

 Thore Graepel, Kristin Lauter, and Michael Naehrig. Ml confidential: Machine learning on
encrypted data. In International Conference on Information Security and Cryptology, pages 1–
21. Springer, 2012
 Andrey Kim, Yongsoo Song, Miran Kim, Keewoo Lee, and Jung Hee Cheon. Logistic
regression model training based on the approximate homomorphic encryption. BMC medical
genomics, 11(4):83, 2018.
 Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John
Wernsing. Cryptonets: Applying neural networks to encrypted data with high throughput and
accuracy. In International Conference on Machine Learning, pages 201–210, 2016.
 Raphael Bost, Raluca Ada Popa, Stephen Tu, and Shafi Goldwasser. Machine learning
classification over encrypted data. In NDSS, volume 4324, page 4325, 2015.
 Ximeng Liu, Robert H Deng, Kim-Kwang Raymond Choo, and Jian Weng. An efficient privacy-
preserving outsourced calculation toolkit with multiple keys. IEEE Transactions on Information
Forensics and Security, 11(11):2401–2414, 2016.
 Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., Ingerman, A., Ivanov, V., ... & Van
Overveldt, T. (2019). Towards federated learning at scale: System design. arXiv preprint
arXiv:1902.01046.

You might also like