You are on page 1of 4

STATEMENT OF PURPOSE

Department of Electrical and Computer Engineering, UC San Diego

Growing up in a household of two generations of Indian Army officers, it is perhaps not surprising that I
had never encountered the words ‘Data’ and ‘Science’ in conjunction. However, when I did come
across Data Science in college, I think it was almost poetic how an innocuous conversation made me
fall in love with the field. During my sophomore year, a friend and I were conjecturing about how
Google Maps was using Machine Learning to predict traffic on a given route and subsequently use
these predictions to determine the best possible route for a journey. For someone whose only
exposure to the term Machine Learning thus far was through a few newspaper articles and trivia
questions, this conversation served as a means to explore previously uncharted territory and as a
catalyst to dive headfirst into Data Science.

I have always found the idea of “predicting” something prophetic, mystical and something that
shouldn’t exist outside of a work of fiction. However, Data Science made me challenge that notion and
perhaps that is where the allure for the field lies for me. While my major in Physics dealt with
questions in the physical realm, Data Science utilizes something as concrete as numbers to get an
insight into the way human minds work; that despite our perceived independence in thought, we
aren’t so different after all.

One of the most significant contributing factors for my natural gravitation towards Data Science was
the fact that the subject is essentially built on the backbone of Statistics. I always had a penchant for
statistics, solely attributable to my high school mathematics teacher, who presented me with one of
my biggest life lessons in the form of a probability lesson: When I was struggling to come to grips with
the idea of zero probability corresponding to an event which would at least be theoretically possible,
he told me that the probability of an event is a measure of our personal beliefs, just because we
believe that an event is impossible doesn't mean that it is. This realization (among many others) has
helped me overcome various challenges that life has thrown at me. Since then, my love for statistics as
a subject has only grown multi-fold, and it is incredibly gratifying to see statistics in action through data
science.

As an undergraduate, I did my Bachelors of Technology in Engineering Physics from the Indian Institute
of Technology, Roorkee, one of the premier engineering colleges in India. Courses like Introduction to
Machine Learning, Data Mining for Business Intelligence, Data Structures, Design and Analysis of
Algorithms, Signals and Systems, and Weather Forecasting have all played a role in my transition from
a data science dilettante to someone who can see himself making lasting contributions to the field. In
addition, as part of the course curriculum, I worked on a multi-class classification problem involving the
Photometric LSST Astronomical time-series dataset (PLAsTICC). I helped formulate the approach to
encoding time-dependent flux information of celestial bodies into multichannel images for a
Convolutional Neural Network (CNN).

Driven by my desire to solve complex real-world problems, I worked on devising a deep-learning-based


algorithm to improve the reconstruction of CT Scan data, a harbinger for crucial development in
medicine, as my Bachelor's Thesis Project- a university requirement. Most of the earlier work done in
this field had focused on solving the inverse problem by direct inversion, which in theory, produces
reconstructions reasonably quickly, albeit at the cost of introducing noise in the reconstruction. My
project involved solving the inverse problem by iterative reconstruction algorithms, which produce
accurate reconstructions but are painstakingly slow in the process. Moreover, iterative algorithms
depend heavily on an optimal choice of initial parameters to avoid getting stuck in local optima and
result in substandard reconstructions (at the risk of oversimplification, similar to the issues that plague
the Gradient Descent algorithm). My contribution, therefore, addressed these concerns and was
twofold: firstly, I implemented an iterative algorithm called Multiplicative Algebraic Reconstruction
Technique (MART) on the GPU, which ensured a drastic reduction in the time overhead and secondly, I
used this efficient implementation to train a Convolutional Neural Network (CNN) to predict the
optimal parameters corresponding to certain features identified from the projection data.

My tryst with academia in Data Science began in December 2019, when I had the opportunity to work
under the tutelage of Professor Sobhan Babu at the Indian Institute of Technology, Hyderabad, where
he was working in the domain of fraud analytics and predicting tax return defaulters for the Goods and
Services Tax (GST) system in India. I was particularly inclined to work at his lab because I was certain
that exposure to real-life data would give me exposure to cutting-edge data science research and
would allow me to learn much more than what any textbook could teach me. Perhaps, I was also
driven by the romanticized prospect of attempting to solve a problem that would directly impact the
governance system in my country, as well as a corollary, improve the lives of its people.

My initial involvement in this project was mainly limited to conducting an extensive literature survey
and implementing several conventional classification algorithms, including Logistic Regression, K-
Nearest Neighbours Classifier and Random Forest Classifier, among others from scratch in Python.
During the subsequent testing of these algorithms on real-life GST Returns data provided by the
Government of Telangana, India, we realized that while conventional classifiers tend to predict a lot of
False Negatives, which consequently have lesser recall making the predictor unsuitable for the task
since it tends to misclassify historical return defaulters as genuine taxpayers. The unsuitability of
conventional classifiers encouraged us to experiment with Cost-Sensitive Classifiers. While these
classifiers minimize the number of False Negative predictions, they leave a lot to be desired in terms of
overall accuracy. This presented us with a dilemma: Does one sacrifice overall model accuracy in favor
of better recall? To find a middle ground, I proposed a framework for an example-dependent cost-
sensitive stacking classifier that uses traditional classifiers as base generalizers to make predictions on
the input space. These predictions were used to train an example-dependent cost-sensitive meta
generalizer. Based on the meta-generalizer choice, four variant models were proposed to predict
potential return defaulters for the upcoming tax filing period. This work spawned into a research paper
which was published and presented by me at the 24th International Conference on Business
Information Systems, hosted remotely at Hannover, Germany, owing to the pandemic. I continue to
remain associated remotely with the research group; wherein now I am working on a related problem
which is proposing a Deep Learning algorithm to detect circular trading among taxpayers in the GST
system.

To make myself privy to the intricacies of how things work in the industry, I interned remotely with
DecisionTree Analytics and Services in the summer of 2020. I worked almost exclusively on their
AutoML library. Specifically, I worked on automating the pipelines for data preprocessing so that the
steps for imputation, outlier detection, skewness treatment and dimensionality reduction were
automated regardless of the data provided by the user.

Currently, I am employed with the Asia-Pacific office of DataChannel Technologies (A sister company of
DecisionTree Analytics and Services), where my work thus far has involved developing a framework to
fetch data from various cloud-based data sources like GoogleAnalytics automatically, and aggregating
said data into warehouses like Amazon Redshift, Google BigQuery, Snowflake and the ilk. Apart from
my work as a Developer, I am also working with the data science team to tackle pressing problems like
Marketing Mix Modelling (MMM) and Campaign Budget Optimization (CBO).

For large parts of my undergraduate study, I had to manage taking care of my ailing mother along with
the rigors of studying such a diverse set of subjects. This usually meant that I would miss a lot of
classes, which predictably had an adverse effect on my grades. Once my mother's health recovered, I
improved my grades, which resulted in a significant upturn in my academic fortunes. Notably, I had a
CGPA average of 9 on a 10 point scale during my final year of undergraduate study, besides having
imbibed an essential lesson of having to strive harder in the face of adversity.

Notwithstanding the above, I have been involved in several co-curricular activities during my sojourn at
IIT Roorkee. As the Executive Editor of WatchOut!, the official campus media body, I spearheaded a
team of 80+ members. Specifically, I was instrumental in curating the Freshman Guide, a manuscript
aimed at acquainting freshers with the campus and culture of IIT Roorkee. I have also co-authored
editorials that explore the origins and predicaments behind several philosophical paradigms such as
free will and determinism. I believe that my stint with the campus media body helped me develop my
communication skills which are proving to be invaluable in my professional experience with data
science.

I hope to continue in the same vein and harness Data Science along with various Machine Learning
algorithms, to solve compelling problems in the domains of finance and medicine. Further, owing to
my research experience so far, and having worked on a few Graph Algorithms as well as certain flavors
of Graph Neural Networks, I believe that I have a lot left to explore in these applications as well. I hope
to bridge the gap by gaining exposure to varied applications of social network analysis in fields
including, but not limited to, textual analysis and data mining. I find the idea of generative models,
specifically their applications in solving inverse problems, very intriguing, and I would like to gain a
better understanding of these models during my Master’s study.

During my time as an undergraduate student at IIT Roorkee, I was privileged enough to be surrounded
by people who initiate conversations that revolve around disrupting the industry. A common
undercurrent throughout these conversations was the fact that an environment that not only
encourages disruption but also celebrates it, is key to ensuring that disruptive ideas do not remain
rooted in the hypothetical. This is what draws me towards the Master’s Program in Machine Learning
and Data Science at the Jacobs School of Engineering, University of California, San Diego. While my
nascent exposure to the field so far has done enough to give me a glimpse of what one can accomplish
in this field, I am optimistic that my time at your esteemed institution will equip me with the right tools
and the right platform to see my ideas come to fruition. Despite working in the industry and academia
concurrently, these assignments have been rather far removed from each other, therefore during my
time at UC San Diego and beyond, I hope to work on real-world problems in conjunction with the
industry. I feel that the various research clusters within the Halicioglu Data Science Institute at UC San
Diego would be the perfect means for me to realize this aspiration. Particularly, I am keen on
interacting with and learning from the members of the Databases and Data Processing Principles and
Systems group and the Learning and Reasoning with Large Data Sets group.

In my fledgling career so far, it has become increasingly apparent that an interdisciplinary field such as
data science demands that one learns continuously. As such, I believe that the well-rounded
curriculum of the MS program in Machine Learning and Data Science would allow me to build a solid
theoretical foundation that propels me to the forefront of the innovations in this burgeoning field.

As I take this plunge into the domains of Data Science, I hope to extract vital learnings during this
journey and with this assurance, I present my candidature to the University of California, San Diego
and look forward to becoming a part of your upcoming Master’s cohort.

You might also like