You are on page 1of 2

Statement of purpose

Name: Seong-Hwan Jun

Objective

My research interest is in machine learning. I am particularly interested in developing computationally


efficient statistical inference methods to solve problems arising in large data settings. My objective for
pursuing a PhD is to become an expert researcher in the field of machine learning with capabilities to
identify the problems suitable for machine learning methods and to develop new methods (or extend old
methods) to solve the problems. By the end of PhD, I will have become an ideal candidate for a research
position in the industry as well as in the academia.

Why pursue a PhD in statistics

It appears that the current trend in machine learning is in applying probabilistic and statistical modeling
techniques for accurate inference. Efficient computation is also a challenge as many of the existing methods
do not scale easily to larger data. I realize that to become an expert in machine learning, I would have to
build strong foundation in mathematics (specifically, probability theory) and computer science as well as
statistics. Although it is more popular to approach machine learning from computer science, my decision is to
approach machine learning from statistics based on two reasons. First, statisticians use probabilistic models
to capture uncertainty for accurate inference; hence, to become a statistician, one needs to be familiar with
modern advances of probability theory. Second, in order to deal with abundance of data, there is a growing
emphasis on computation in modern statistics. A relatively new field of computational statistics is at the
frontiers of statistics research so that statistics can be applied to large data settings. Therefore, I concluded
that the best way to aproach machine learning is from statistics as it is the only discipline that specifically
focuses on three components (probability theorey, computation, and inference) necessary for becoming an
expert researcher in machine learning.

Research experience

– As part of the research curriculum during my Master’s degree, I participated in a weekly machine learning
reading group where the members took turns to read a paper and present the main results to the group.
The topics chosen include computational methods for large dataset, Bayesian computational methods, and
non-parametric Bayesian modeling techniques to name a few. We also read many applied research papers
where the statistical methods are applied to problems in computational linguistics and phylogenetics. The
reading group trained me to read and extract the main points efficiently from the research papers.
– I have one publication titled “Entangled Monte Carlo,” which was published in the proceedings of the
25th conference of Neural Information Processing Systems (NIPS). I have attended the conference and made
a spotlight presentation as well as a poster presentation. The paper proposes a method for efficiently
distributing computation of the popular Sequential Monte Carlo method over multiple computing nodes.
– The reading group helped me to shape out my research interest. Currently, I am interested in a problem of
inferring evolutionary relationship between (natural) languages using statistical modeling and computational

1
techniques. I intend to tackle many problems arising in this field of computational linguistics by applying
statistical models and machine learning methods.
– Another interest I have is in non-parametric statistical methods. I gained appreciation for this class of
methods while attending the NIPS conference as I noticed that many machine learning researchers apply non-
parametric statistical methods, both Bayesian and frequentist, to solve the variety of problems. I have been
introduced to Bayesian non-parametric methods through the aforementioned reading group; however, I have
never had much contact with the recent developments in non-parametric (frequentist) statistics. Recently, I
found myself to often wanting to learn about the non-parametric methods and to extend them so that I can
apply them to my research. It is one of new rising interests of mine, which I intend to explore in the future.
– I gave presentation on Sequential Monte Carlo and Entangled Monte Carlo methods at the SFU-UBC joint
seminar in September 2012.
– I gave two presentations at the UBC Department of Statistics student seminar. In the first presentation,
I introduced basics of C/C++ programming and GNU gsl library to the fellow students. In the second
presentation, I gave a walk-through on how to use Amazon EC2 servers for free and provided tips for
performing computing on department servers.

You might also like