Practical and Robust Privacy Preserving Machine Learning: Cygogn

Practical and Robust
Privacy Preserving
Machine
Cygogn
Learning
By Dr. Rania Talbi
NLP Workshop hosted by Wikit, TeamWork, and Cygogn
February 22nd2022, Lyon, France.

Outline
 Context and Problem Statement
 Privacy in Distributed Machine Learning:
PrivML: Practical Privacy Preserving Machine Learning
 Robustness in Federated Learning:
ARMOR: Mitigating Poisoning Attacks in Federated Learning
 Conclusion and Perspectives

Context &
Problem
Statement
Privacy issues in Distributed ML
𝐷 𝑂1 Data Owner
Global ML Service
model Provider
𝐷 𝑂5
𝐷 𝑂2 𝑀𝐿𝑆𝑃 𝐷 𝑂4
Local
training
data
𝐷 𝑂3
4
Cryptography-based vs. Non-cryptographic
PPML
Privacy
Privacy
Cliquez pour ajouter du texte

Utility Runtime
Utility Runtime
Cryptography- Non-cryptographic
based techniques techniques
(HE-based, and (Data & Output
MPC-based PPML Perturbation, Data
methods) Anonymization)
5
Cryptography-based vs. Non-cryptographic
PPML
Privacy
Privacy
Utility Runtime
Utility Runtime
Cryptography- Non-cryptographic
based techniques techniques
(HE-based, and (Data & Output
MPC-based PPML Perturbation, Data
methods) Anonymization)
6
Privacy in Distributed
Machine Learning
PrivML:
Practical Privacy Preserving Machine
Learning
Background on Homomorphic Encryption
PHE SWHE FHE
Increasing
performance
overhead
8
PrivML’s Design Objectives
Propose
mechanisms to
reduce the
overhead of HE
Make sure that

PPML model Ensure end-to-
utility is not end privacy
degraded preservation
9
Overview of PrivML’s Architecture
Learning phase
ML Service
𝐷𝑂1
Provider
(MLSP)
𝐷𝑂2
.......
𝐷𝑂 𝑁
PrivML
10
Learning phase
KMU
ML Service
𝐷𝑂1
Provider
(MLSP)
𝐷𝑂2
Encrypted training
data via multiple keys
𝐷𝑂 𝑁
PrivML
11
Learning phase
KMU
ML Service
𝐷𝑂1
Provider
(MLSP)
MU
𝐷𝑂2
Encrypted training
HE-based data via multiple keys
computation
protocols SU 𝐷𝑂 𝑁
PrivML
12
Learning phase
KMU
ML Service
𝐷𝑂1
Provider
(MLSP)
MU
𝐷𝑂2
Encrypted training
Encrypted global data via multiple keys
model trained over the
joint data SU 𝐷𝑂 𝑁
PrivML
13
Prediction phase Learning phase
KMU
ML Service
𝐷𝑂1
Provider
Encrypted (MLSP)
Querier
Classification Query
MU
𝐷𝑂2
Encrypted training
Encrypted
Classification
Response
SU 𝐷𝑂 𝑁
PrivML
14
Threat Model
KMU
ML Service
𝐷𝑂1
Provider
Encrypted (MLSP)
Querier
MU
𝐷𝑂2
Encrypted training
Encrypted
Classification
Response
SU 𝐷𝑂 𝑁
PrivML
 The Key Management Unit is trusted
15
Threat Model
KMU
ML Service
𝐷𝑂1
Provider
Encrypted (MLSP)
Querier
MU
𝐷𝑂2
Encrypted training
Encrypted
Classification
Response
SU 𝐷𝑂 𝑁
PrivML
 All the other parties are honest but curious
16
Threat Model
KMU
ML Service
𝐷𝑂1
Provider
Encrypted (MLSP)
Querier
MU
𝐷𝑂2
Encrypted training
Encrypted
Classification
Response
SU 𝐷𝑂 𝑁
PrivML
 Data owners and queriers are mutually untrusted
17
Threat Model
KMU
ML Service
𝐷𝑂1
Provider
Encrypted (MLSP)
Querier
MU
𝐷𝑂2
Encrypted training
Encrypted
Classification
Response
SU 𝐷𝑂 𝑁
PrivML
 Computation units are non-colluding 18
Cryptographic Primitives Underlaying PrivML
(1) Homomorphic
Addition
Distributed
Two-Trapdoor
+
Public-Key
Cryptosystem
(DT-PKC)
[Liu 2016] [ 𝑥 ] 𝑝𝑘𝜆
(2) Homomorphic Scalar
Multiplication
19
(1) Homomorphic
Addition
Distributed
Two-Trapdoor
+
Public-Key
Cryptosystem
(DT-PKC)
[Liu 2016] [ 𝑥 ] 𝑝𝑘𝜆
Multiplication
20
(1) Homomorphic
Addition
Distributed
Two-Trapdoor
+
Public-Key
Cryptosystem
(DT-PKC)
[Liu 2016] [ 𝑥 ] 𝑝𝑘𝜆
Multiplication
21
Distributed
Two-Trapdoor
Public-Key
Cryptosystem 𝑆𝐾
(DT-PKC)
[Liu 2016]

22
ML Service
Distributed Provider
Two-Trapdoor 𝑃𝑆𝑑𝑒 𝑐 1 (𝑆 𝐾 1 , ..) (MLSP)
Public-Key
(DT-PKC) [𝐶𝑇]
[Liu 2016]

23
ML Service
Public-Key
(DT-PKC) [𝐶𝑇]
[Liu 2016]

𝑃𝑆𝑑𝑒𝑐 2 (𝑆 𝐾 2 ,..)
24
ML Service
Public-Key
Cryptosystem 𝑆𝐾 MU
(DT-PKC) [𝐶𝑇]
[Liu 2016]

𝑃𝑆𝑑𝑒𝑐 2 (𝑆 𝐾 2 ,..)
SU
25
Outsourced Privacy Preserving Computations in
PrivML
@MU
Privacy
Preserving
Computation
Protocols in @SU
PrivML

@MU
26
PrivML
Operand blinding with

random values
homomorphically @MU
𝑃𝑆𝑑𝑒 𝑐 1 (𝑆 𝐾 1 , ..)
Privacy
Preserving
Computation
Protocols in @SU
PrivML

@MU
27
PrivML

random values
homomorphically @MU
𝑃𝑆𝑑𝑒 𝑐 1 (𝑆 𝐾 1 , ..)
Privacy
Preserving 𝑃𝑆𝑑𝑒 𝑐 2 (𝑆 𝐾 2 ,..)
Computation
Computations over
Protocols in blinded operands @SU
PrivML
𝐸𝑛𝑐 (𝑝𝑘𝑤 , ..)
@MU
28
PrivML

random values
homomorphically @MU
𝑃𝑆𝑑𝑒 𝑐 1 (𝑆 𝐾 1 , ..)
Privacy
Preserving 𝑃𝑆𝑑𝑒 𝑐 2 (𝑆 𝐾 2 ,..)
Computation
Computations over
Protocols in blinded operands @SU
PrivML
𝐸𝑛𝑐 (𝑝𝑘𝑤 , ..)
Removing random blinding

values from output @MU
homomorphically
29
PPML Design process in PrivML
ML
Algorithm
30
ML
Algorithm
𝑥. 𝑦 …
√𝑥 ¿ 𝑋 , 𝑌 >¿
Elementary
Operations
31
ML
Algorithm
HE-based Privacy
𝑥. 𝑦 … Preserving Protocols
Design
√𝑥 ¿ 𝑋 , 𝑌 >¿
Elementary
Operations
32
ML
Algorithm
HE-based Privacy
𝑥. 𝑦 … Preserving Protocols 𝑥. 𝑦 …
Design
√𝑥 ¿ 𝑋 , 𝑌 >¿ √𝑥 ¿ 𝑋 , 𝑌 >¿
Elementary
Operations Privacy Preserving
Elementary Operations 33
ML
PPML
Algorithm
Algorithm
HE-based Privacy
Design
√𝑥 ¿ 𝑋 , 𝑌 >¿ √𝑥 ¿ 𝑋 , 𝑌 >¿
Elementary
ML
PPML
Algorithm Close Algorithm
or identical
output
HE-based Privacy
Design
√𝑥 ¿ 𝑋 , 𝑌 >¿ √𝑥 ¿ 𝑋 , 𝑌 >¿
Elementary
Threat Model
Overhead Reduction Strategies in PrivML
Round Complexity
Minimization
Pre-computations of Random
Powers
Optimized Large Number

Arithmetic
Parallel Computing Pre-computation of

values that are not
dependent on
Analytical Approximations online operands
Ciphertext Packing
37
Round Complexity ▪ Pre-computation of a significant amount

Minimization of random blinding values & random
encryption powers without impacting
Powers security level.
▪ Intuition: Product of random powers is a
Arithmetic random power
Parallel Computing Pre-computation of

values that are not
dependent on
Analytical Approximations online operands
Christine Jost, Ha Lam, Alexander Maximov, and Ben JM Smeets. Encryption

Ciphertext Packing performance improvements of the paillier cryptosystem. IACR Cryptol. ePrint
Arch., 2015:864, 2015.
38
Round Complexity ▪ Pre-computation of a significant amount

Minimization of random blinding values & random
encryption powers without impacting
Powers security level.
▪ Intuition: Product of random powers is a
Arithmetic random power
▪ According to [Jost 2015], we pre-compute
Parallel Computing of random powers and multiply of them Pre-computation of
values that are not
for 80-bit level security dependent on
Analytical Approximations ▪ Tradeoff between storage and pre- online operands
computations
Christine Jost, Ha Lam, Alexander Maximov, and Ben JM Smeets. Encryption
Ciphertext Packing performance improvements of the paillier cryptosystem. IACR Cryptol. ePrint
Arch., 2015:864, 2015.
39
Round Complexity
Minimization
Powers ▪ Use of Schönhage and Strassen FFT multiplication [Gaudry
Optimized Large Number 2007] to implement DT-PKC cryptosystem primitives
Arithmetic ▪ We use an assembly-based sub-routine provided in The GNU
Multiple Precision Arithmetic Library [Granlund 2012].
Parallel Computing
• Pierrick Gaudry, Alexander Kruppa, and Paul Zimmermann. A gmp-based implementation
of schönhage-strassen’s large integer multiplication algorithm. In Proceedings of
international symposium on Symbolic and algebraic computation, pages 167–174, 2007.
Analytical Approximations • Torbjörn Granlund and the GMP development team. GNU MP: The GNU Multiple Precision
Arithmetic Library, 5.0.5 edition, 2012. http://gmplib.org/.
Ciphertext Packing
40
Privacy in Distributed
Machine Learning
PrivML:
III. Experimental Evaluation of PrivML

Implementation & Experimental Setup
Implementation & Evaluation Scenarios Evaluation Datasets
C++ library available at: ▪ Real World Datasets from UCI are
https://gitlab.liris.cnrs.fr/rtalbi/privml used : Adult, Bank, Nursery, Iris &
Evaluation Scenarios: Edinburgh
Performance of PrivML’s ▪ For all datasets a subset of 2000

PPML methods
records is extracted (except for Iris &
Edinburgh)
Performance of PrivML’s
Privacy Preserving ▪ 80% - 20% Split between training and
Building Blocks
testing data
Performance of PrivML’s
Cryptographic primitives Dheeru Dua and Casey Graff. UCI machine learning
repository, https://archive.ics.uci.edu/ml, 2017.
42
Performance of PrivML’s Cryptographic
Primitives
43
Performance of PrivML’s Cryptographic
building blocks
44
Performance of PrivML’s PPML methods
45
Comparison with Related-Works
46
Comparison with Related-Works
47
Robustness in Federated
Learning
I. Background & Related Work
on Federated Learning (FL)
& Attacks targeting FL
Generalities on Federated Learning
 FL ensures privacy by design

FL server (Data never leaves clients
devices)
Global model
Workers
49

(Data never leaves clients
devices)
Global model  FL is massively distributed

 Data is distributed in a non-IID
way
 Unbalanced distribution of
training data between workers
Workers
50
(1) The global  FL ensures privacy by design

model is sent to
selected
workers devices)
Global model  FL is massively distributed

way
Workers
51
(2) Workers  FL ensures privacy by design

carry local
training and
send model devices)
updates to the
 FL is massively distributed
FL server
Model way
updates
Workers
52
(3) The FL  FL ensures privacy by design

server
aggregates
model updates Model aggregation
devices)
and sends the
new model to
workers  Data is distributed in a non-IID
Model way
updates
Workers
53

Secure aggregation
devices)
way
 Privacy is improved via Secure
Workers Aggregation [Bonawitz 2017]
54
Targeted Data Poisoning attacks in FL
Clean training
data for traffic
signs
classification
55
Mislabel to
Stop signs
Clean training
data for traffic
signs
classification
56
Mislabel to
Stop signs
Carry local
Clean training training and
data for traffic generate
signs a faulty model
classification update
57
Mislabel to
Stop signs
Carry local
signs a faulty model Send the
classification update faulty model
update to the
FL server for
aggregation
58
Repeat until the

backdoor is
injected within
the model
Mislabel to
Stop signs
Carry local
update to the
FL server for
aggregation
59
Repeat until the

backdoor is
injected within
the model
Mislabel to
Stop signs
Carry local
update to the
FL server for
Single shot model poisoning: [ Bagdasaryan 2019] aggregation
60
Robustness in Federated
Learning
ARMOR:
II. Design Principles of ARMOR

Overview of ARMOR
 Use the FL model to

generate a class
representatives' dataset
62
Overview of ARMOR

generate a class
63
Overview of ARMOR

generate a class
 GAN-based approach for
synthetic data generation
(ARgan)
64
Overview of ARMOR

generate a class
 GAN-based approach for
synthetic data generation
(ARgan)
 Use the class
representatives to monitor
the FL model evolution
through multiple rounds
(MORpheus)
65
ARMOR’s Components: (1) ARgan
66
ARMOR’s Components: (1) ARgan
67
ARMOR’s Components: (2) MORpheus
68
69
70
71
Experimental Results
Robustness Evaluation
(1) Data Poisoning (2) Model Poisoning
72
Experimental Results
Robustness & Utility tradeoff Evaluation
Mitigation success rate (%) Mitigation success rate (%)

(1) Data Poisoning (2) Model Poisoning
73
Bibliography
Bibliography
 Rakesh Agrawal and Ramakrishnan Srikant. Privacy-preserving data mining. In Proceedings of

the 2000 ACM SIGMOD international conference on Management of data, volume 29, pages
439–450. ACM, 2000.
 Payman Mohassel and Yupeng Zhang. Secureml: A system for scalable privacypreserving
machine learning. In 2017 IEEE Symposium on Security and Privacy (S&P), pages 19–38.
IEEE, 2017.
 Andrey Kim, Yongsoo Song, Miran Kim, Keewoo Lee, and Jung Hee Cheon. Logistic
regression model training based on the approximate homomorphic encryption. BMC medical
genomics, 11(4):83, 2018.
 Raphael Bost, Raluca Ada Popa, Stephen Tu, and Shafi Goldwasser. Machine learning
classification over encrypted data. In NDSS, volume 4324, page 4325, 2015.
 Sangwook Kim, Masahiro Omori, Takuya Hayashi, Toshiaki Omori, Lihua Wang, and Seiichi
Ozawa. Privacy-Preserving Naive Bayes Classification Using Fully Homomorphic Encryption.
In International Conference on Neural Information Processing, pages 349–358. Springer, 2018.
75
Bibliography
 Thore Graepel, Kristin Lauter, and Michael Naehrig. Ml confidential: Machine learning on
encrypted data. In International Conference on Information Security and Cryptology, pages 1–
21. Springer, 2012
 Andrey Kim, Yongsoo Song, Miran Kim, Keewoo Lee, and Jung Hee Cheon. Logistic
regression model training based on the approximate homomorphic encryption. BMC medical
genomics, 11(4):83, 2018.
 Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John
Wernsing. Cryptonets: Applying neural networks to encrypted data with high throughput and
accuracy. In International Conference on Machine Learning, pages 201–210, 2016.
 Raphael Bost, Raluca Ada Popa, Stephen Tu, and Shafi Goldwasser. Machine learning
classification over encrypted data. In NDSS, volume 4324, page 4325, 2015.
 Ximeng Liu, Robert H Deng, Kim-Kwang Raymond Choo, and Jian Weng. An efficient privacy-
preserving outsourced calculation toolkit with multiple keys. IEEE Transactions on Information
Forensics and Security, 11(11):2401–2414, 2016.
 Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., Ingerman, A., Ivanov, V., ... & Van
Overveldt, T. (2019). Towards federated learning at scale: System design. arXiv preprint
arXiv:1902.01046.

Practical and Robust Privacy Preserving Machine Learning: Cygogn

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Practical and Robust Privacy Preserving Machine Learning: Cygogn

Uploaded by

Copyright:

Available Formats

Practical and Robust

NLP Workshop hosted by Wikit, TeamWork, and Cygogn

February 22nd2022, Lyon, France.

 Context and Problem Statement

 Privacy in Distributed Machine Learning:

PrivML: Practical Privacy Preserving Machine Learning

 Robustness in Federated Learning:

ARMOR: Mitigating Poisoning Attacks in Federated Learning

 Conclusion and Perspectives

Cliquez pour ajouter du texte

PHE SWHE FHE

Make sure that

Prediction phase Learning phase

Prediction phase Learning phase

Prediction phase Learning phase

Prediction phase Learning phase

Prediction phase Learning phase

Operand blinding with

Operand blinding with

Operand blinding with

Removing random blinding

Optimized Large Number

Parallel Computing Pre-computation of

Round Complexity ▪ Pre-computation of a significant amount

Parallel Computing Pre-computation of

Christine Jost, Ha Lam, Alexander Maximov, and Ben JM Smeets. Encryption

Round Complexity ▪ Pre-computation of a significant amount

III. Experimental Evaluation of PrivML

Implementation & Evaluation Scenarios Evaluation Datasets

Performance of PrivML’s ▪ For all datasets a subset of 2000

 FL ensures privacy by design

 FL ensures privacy by design

Global model  FL is massively distributed

(1) The global  FL ensures privacy by design

Global model  FL is massively distributed

(2) Workers  FL ensures privacy by design

(3) The FL  FL ensures privacy by design

 FL ensures privacy by design

Workers Aggregation [Bonawitz 2017]

Repeat until the

Repeat until the

II. Design Principles of ARMOR

 Use the FL model to

 Use the FL model to

 Use the FL model to

 Use the FL model to

(1) Data Poisoning (2) Model Poisoning

Robustness & Utility tradeoff Evaluation

Mitigation success rate (%) Mitigation success rate (%)

 Rakesh Agrawal and Ramakrishnan Srikant. Privacy-preserving data mining. In Proceedings of

You might also like