Technische Universität München

TMF Workshop: Anonymization tools and their practical relevance (for biomedical research)

ARX
A Comprehensive Tool for
Anonymizing Biomedical Data
Fabian Prasser, Florian Kohlmayer, Klaus A. Kuhn

Lehrstuhl für Medizinische Informatik
Institut für Medizinische Statistik und Epidemiologie
Klinikum rechts der Isar der TU München

ARX - A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München

ARX: Towards useful data anonymization
• Usability has many dimensions
• Ability to balance data utility with privacy requirements
• Need to support a broad spectrum of methods
• Privacy models
• Transformation models
• Methods for analyzing data utility
• Methods for analyzing risks
• Further “non-functional” requirements
• Integrated and harmonized: ARX is not a “tool box”
• Compatibility (syntactic and semantic)
• Performance and scalability
• Intuitive visualization and parameterization
• Provide methods to end-users as well as programmers
19.03.2015 Workshop: Anonymization tools and their practical relevance - TMF e.V. 2

ARX - A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München

ARX: Highlights
 Compatibility
• Built-in data import facilities
• Relational databases (MS SQL, DB2, SQLite, MySQL)
• MS Excel
• CSV (all common formats, auto-detection)
• Support for different data types and scales of measure
• Strings (with nominal and ordinal scale)
• Dates (interval scale)
• Numbers (ratio scale)
• Automatic detection of data types and formats
• Methods for handling and cleaning low-quality data
• Handles missing and invalid values correctly
• In privacy models, transformation methods, visualizations
• Manual removal of tuples, query interface, find & replace
19.03.2015 Workshop: Anonymization tools and their practical relevance - TMF e.V. 3

suitability of methods  Functional representations of transformation rules • Especially functional representations of hierarchies • Support for categorical and continuous variables (categorization)  Multiple methods for measuring data utility • Parametrizable.g.TMF e. e.V.03.& bottom coding • Local recoding • Tuple suppression • Fully integrated and parameterizable • Importance of attributes.2015 Workshop: Anonymization tools and their practical relevance .ARX .A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Highlights  Flexible transformation methods • Global recoding • Full-domain generalization • Top. with different aggregate functions • Use functional representations of transformation rules 19.. 4 .

ℓ)) • t-Closeness (equal and hierarchical ground distance) • δ-Presence  Multiple methods for risk-based anonymization • Sample characteristics • Average cell size • Sample uniqueness • Super-population models • Decision rule by Dankar et al. entropy.03.TMF e.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Highlights  Multiple apriori privacy models • k-Anonymity • ℓ-Diversity (distinct. • Based on models by Pitman. recursive-(c.V. 5 .ARX . Zayatz and the SNB model  Support for arbitrary combinations of these models • Optimal solution within our coding model 19.2015 Workshop: Anonymization tools and their practical relevance .

2015 Workshop: Anonymization tools and their practical relevance .g. 6 . t-closeness.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Highlights  Scalability: ARX can handle large datasets (several million data entries) on commodity hardware  Efficient in-memory data management engine • Works with compressed data representations • Tight coupling between transformation operators and the „database kernel” • Provides a space-time trade-off  Optimized search strategy: Based on multiple pruning strategies  Efficient implementations of further complex tasks • Evaluations of privacy criteria (e.ARX .V.TMF e. δ-presence) • Methods for solving non-linear equation systems • Background jobs in the user interface 19.03.

2015 Workshop: Anonymization tools and their practical relevance .V. 7 .ARX .TMF e. Linux/GTK.03.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Highlights  Comprehensive graphical interface • Scalablity comparable to the ARX API • Supports all methods provided by the ARX API • Cross-platform (Windows. OSX) with native interfaces • Available as binary distributions with installers  Independent API • User interface sits on top of the API • Java library • All methods provided by ARX are first-class citizens in both worlds 19.

ARX . mapped to four perspectives .V.Compare and analyze . 8 .Regarding risks and .2015 Workshop: Anonymization tools and their practical relevance .Filter and analyze the input and output .Define coding model .Define transformation model .TMF e.Organize transformations utility 19.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Anonymization workflow  Iterative process to successively refine transformations  Supported by the scalability of our framework  Three (repeating) steps.Define privacy model solution space .03.

TMF e. 9 .A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Wizard for data import 19.03.ARX .V.2015 Workshop: Anonymization tools and their practical relevance .

03.ARX .V.2015 Workshop: Anonymization tools and their practical relevance .A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Configuration (1) 19. 10 .TMF e.

2015 Workshop: Anonymization tools and their practical relevance .TMF e.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Configuration (2) 19.03.ARX .V. 11 .

V.2015 Workshop: Anonymization tools and their practical relevance .03. 12 .TMF e.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Configuration (3) 19.ARX .

ARX .V.TMF e.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Configuration (4) 19. 13 .2015 Workshop: Anonymization tools and their practical relevance .03.

A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Wizard for transformation rules 19.ARX . 14 .2015 Workshop: Anonymization tools and their practical relevance .V.TMF e.03.

2015 Workshop: Anonymization tools and their practical relevance .03. 15 .TMF e.ARX .V.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Further dialogs 19.

ARX .03.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Exploration (1) 19.2015 Workshop: Anonymization tools and their practical relevance . 16 .TMF e.V.

V.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Exploration (2) 19.03.TMF e. 17 .ARX .2015 Workshop: Anonymization tools and their practical relevance .

A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Exploration (3) 19.V. 18 .03.TMF e.ARX .2015 Workshop: Anonymization tools and their practical relevance .

03.TMF e.ARX .A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Utility analysis (1) 19.V. 19 .2015 Workshop: Anonymization tools and their practical relevance .

TMF e.ARX . 20 .03.2015 Workshop: Anonymization tools and their practical relevance .V.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Utility analysis (2) 19.

TMF e.03.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Utility analysis (3) 19.2015 Workshop: Anonymization tools and their practical relevance .V.ARX . 21 .

V.2015 Workshop: Anonymization tools and their practical relevance . 22 .TMF e.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Utility analysis (4) 19.03.ARX .

A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Utility analysis (5) 19.TMF e.V.2015 Workshop: Anonymization tools and their practical relevance .03.ARX . 23 .

V.03.TMF e. 24 .ARX .2015 Workshop: Anonymization tools and their practical relevance .A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Risk analysis (1) 19.

2015 Workshop: Anonymization tools and their practical relevance .V. 25 .A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Risk analysis (2) 19.TMF e.ARX .03.

A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Risk analysis (3) 19.V.03.ARX .2015 Workshop: Anonymization tools and their practical relevance .TMF e. 26 .

ARX .2015 Workshop: Anonymization tools and their practical relevance . 27 .03.TMF e.V.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Risk analysis (4) 19.

ARX . 28 .03.V.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Context-sensitive help 19.2015 Workshop: Anonymization tools and their practical relevance .TMF e.

A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München ARX: Further developments ● Current projects ● Non-interactive Differential Privacy ● Support for high-dimensional data: heuristic algorithms ● Further analyses and visualizations: utility and risks ● Support for transactional attributes: (k. k m)-anonymity ● Integrated data masking methods ● More flexible definition of quasi-identifiers ● More risk models ● Planned projects ● Auto-detection of HIPAA identifiers ● Implement more flexible privacy criteria 19.TMF e.ARX .2015 Workshop: Anonymization tools and their practical relevance .03.V. 29 .

TMF e.A Comprehensive Tool for Anonymizing Biomedical Data Technische Universität München Thank you for your attention • ARX is open source software • Contribute: feature requests.tum.V.ARX . enhancements. questions • Repository: https://github. 30 .de) • Any questions? 19. criticism.de) • Florian Kohlmayer (florian.org • Get in touch • Fabian Prasser (prasser@in.com/arx-deidentifier/arx • Further information & download: http://arx.03. code reviews.kohlmayer@tum.2015 Workshop: Anonymization tools and their practical relevance .deidentifier.