You are on page 1of 3

Decision support for releasing anonymised data

 Magnus Jändel
Choose an option to locate/access this article:

Show more
DOI: 10.1016/j.cose.2014.07.001
Get rights and content

Highlights
Anonymised data bases are vulnerable to de-anonymisation attacks.

The adversarial mutual information is the key factor controlling privacy risk.

An information-theoretic de-anonymisation feasibility limit is derived.

A process for making decisions about the release of anonymised data is described.

Scenarios related to demographic data and to air travel data are discussed.

Abstract
For legal and privacy reasons it is often prescribed that data bases containing sensitive personal data
can be published only in anonymised form. History shows, however, that the privacy of anonymised
data in many cases is easily broken by de-anonymisation attacks. This paper defines guiding
principles for decisions about releasing anonymised data and provides a simple process for
analysing de-anonymisation risk and for making decisions about publishing anonymised personal
data. At the heart of this process is an information-theoretic de-anonymisation feasibility limit that
is independent of the details of both the anonymisation procedure and the adversarial de-
anonymisation algorithms. This feasibility limit relates the adversarial mutual information of the
anonymised data and the attacker's background information to the number of records in the
anonymised data base and the acceptable risk of privacy violations. Based on this result, we explain,
discuss and exemplify the process for making decisions about releasing anonymised data.
Keywords
 Anonymisation;
 De-anonymisation;
 Privacy-preserving data mining;
 Privacy-preserving data publishing;
 Decision support
Copyright © 2014 Elsevier Ltd. All rights reserved.

How to choose a good thesis topic in Data Mining?
Posted on March 15, 2013 by Philippe Fournier-Viger

I have seen many people asking for help in data mining forums and on other websites about how to
choose a good thesis topic in data mining. Therefore, in this this post, I will address this
question.
The first thing to consider is whether you want to design/improve data mining
techniques, apply data mining techniques or do both. Personally, I think that designing or
improving data mining techniques is more challenging than using already existing techniques.
Moreover, you can make a more fundamental contribution if you work on improving data mining
techniques instead of applying them. However, you need to be aware that improving data mining
techniques may require better algorithmic and/or mathematics skills.
The second thing to consider is what kind of techniques you want to apply or
design/improve? Data mining is a broad field consisting of many techniques such as neural
networks, association rule mining algorithms, clustering and outlier detection. You should try to get
some overview of the different techniques to see what you are more interested in. To get a rough
overview of the field, you could read some introduction books on data mining such as the book by
Tan, Steinbach & Kumar (Introduction to data mining) or read websites and articles related to data
mining. If your goal is just to apply data mining techniques to achieve some other purpose (e.g.
analysing cancer data) but you don’t know which one yet, you could skip this question.
The third thing to consider is which problems you want to solve or what you want to
improve. This requires more thoughts. A good way is to look at recent good data mining
conferences (KDD, ICDM, PKDD, PAKDD, ADMA, DAWAK, etc.) and journals (TKDE, TKDD,
KAIS, etc.), or to attend conferences, if possible, and talk with other researchers. This helps to see
what are the current popular topics and what kind of problems researchers are currently trying to
solve. It does not mean that you need to work on the most popular topic. Working on a popular
topic (e.g. social network mining) has several advantages. It is easier to get grants or in some case to
get your papers accepted in special issues, workshops, etc. However, there are also some “older”
topics that are also interesting even if they are not the current flavor of the day. Actually, the most
important is that you find a topic that you like and will enjoy working on it for perhaps a few years
of your life. Finding a good problem to work on can require to read several articles to understand
what are the limitations of current techniques and decide what can be improved. So don’t worry. It
is normal that it takes time to find a more specific topic.
Fourth, one should not forget that helping to choose a thesis topic is also the job of the
professor that supervise the Master or Ph.D Students. Therefore, if you are looking for
a thesis topic, it is good to talk with your supervisor and ask for suggestions. He should help you.
If you don’t have a supervisor yet, then try to get a rough idea of what you like, and try to
meet/discuss with professors that could become your supervisors. Some of them will perhaps have
some research projects and ideas that they could give you if you work with them. Choosing a
supervisor is a very important and strategic decision that every graduate student has to make. For
more information about choosing a supervisor, you can read this post : How to choose a research
advisor for M.Sc. / Ph.D ?
Lastly, I would like to discuss the common question “please give me a Ph.D. topic in data
mining“, that I read on websites and that I sometimes receive in my e-mails. There are two
problems with this question. The first problem is that it is too general. As mentioned, data
mining is a very broad field. For example, I could suggest you some very specific topics such as
detecting outliers in imbalanced stock market data or to optimize the memory efficiency of
subgraph mining algorithms for community detection in social networks. But will you like it? It is
best to choose something by yourself that you like. The second problem with the above question is
that choosing a topic is the work that a researcher should do or learn to do. In fact, in research, it
is equally important to be able to find a good research problem as it is to find a good
solution. Therefore, I highly recommend to try to find a research topic by yourself, as it is
important to develop this skill to become a successful researcher. If you are a student, when
searching for a topic, you can ask your research advisor to guide you.