Professional Documents
Culture Documents
Machine Learning Research Paper
Machine Learning Research Paper
M
(2013).
5. A. Sadilek, H. Kautz, V. Silenzio, in Proceedings of the
achine learning is a discipline focused ance when executing some task, through some
Input image Convolutional feature extraction RNN with attention over image Word by word
generation
A
bird
flying
LSTM over
a body
of
14 x 14 feature map
water
Fig. 2. Automatic generation of text captions for images with deep networks. A convolutional neural network is trained to interpret images, and its
output is then used by a recurrent neural network trained to generate a text caption (top). The sequence at the bottom shows the word-by-word focus of
the network on different parts of input image while it generates the caption word-by-word. [Adapted with permission from (30)]
Topics Documents
gene 0.04
dna 0.02 Topic proportions
genetic 0.01 and assignments
.,,
genes organism genes
survive?
brain 0.04
neuron 0.02
nerve 0.01
.,,
predictions
Fig. 3. Topic models. Topic modeling is a methodology for analyzing documents, where a document is viewed as a collection of words, and the words in
the document are viewed as being generated by an underlying set of topics (denoted by the colors in the figure). Topics are probability distributions
across words (leftmost column), and each document is characterized by a probability distribution across topics (histogram). These distributions are
inferred based on the analysis of a collection of documents and can be viewed to classify, index, and summarize the content of documents. [From (31).
Copyright 2012, Association for Computing Machinery, Inc. Reprinted with permission]
(see Fig. 2). Deep network methods are being rithms are developed to optimize the criterion. egy is trained to chose actions for any given state,
actively pursued in a variety of additional appli- As another example, clustering is the problem with the objective of maximizing its expected re-
cations from natural language translation to of finding a partition of the observed data (and ward over time. The ties to research in control
collaborative filtering. a rule for predicting future data) in the absence theory and operations research have increased
The internal layers of deep networks can be of explicit labels indicating a desired partition. over the years, with formulations such as Markov
viewed as providing learned representations of A wide range of clustering procedures has been decision processes and partially observed Mar-
the input data. While much of the practical suc- developed, all based on specific assumptions kov decision processes providing points of con-
cess in deep learning has come from supervised regarding the nature of a “cluster.” In both clus- tact (15, 16). Reinforcement-learning algorithms
learning methods for discovering such repre- tering and dimension reduction, the concern generally make use of ideas that are familiar
sentations, efforts have also been made to devel- with computational complexity is paramount, from the control-theory literature, such as policy
op deep learning algorithms that discover useful given that the goal is to exploit the particularly iteration, value iteration, rollouts, and variance
representations of the input without the need for large data sets that are available if one dis- reduction, with innovations arising to address
labeled training data (13). The general problem is penses with supervised labels. the specific needs of machine learning (e.g., large-
referred to as unsupervised learning, a second A third major machine-learning paradigm is scale problems, few assumptions about the un-
paradigm in machine-learning research (2). reinforcement learning (14, 15). Here, the infor- known dynamical environment, and the use of
Broadly, unsupervised learning generally in- mation available in the training data is inter- supervised learning architectures to represent
volves the analysis of unlabeled data under as- mediate between supervised and unsupervised policies). It is also worth noting the strong ties
sumptions about structural properties of the learning. Instead of training examples that in- between reinforcement learning and many dec-
data (e.g., algebraic, combinatorial, or probabi- dicate the correct output for a given input, the ades of work on learning in psychology and
listic). For example, one can assume that data training data in reinforcement learning are as- neuroscience, one notable example being the
lie on a low-dimensional manifold and aim to sumed to provide only an indication as to whether use of reinforcement learning algorithms to pre-
identify that manifold explicitly from data. Di- an action is correct or not; if an action is incor- dict the response of dopaminergic neurons in
mension reduction methods—including prin- rect, there remains the problem of finding the monkeys learning to associate a stimulus light
cipal components analysis, manifold learning, correct action. More generally, in the setting of with subsequent sugar reward (17).
factor analysis, random projections, and autoen- sequences of inputs, it is assumed that reward Although these three learning paradigms help
coders (1, 2)—make different specific assump- signals refer to the entire sequence; the assign- to organize ideas, much current research involves
tions regarding the underlying manifold (e.g., ment of credit or blame to individual actions in the blends across these categories. For example, semi-
that it is a linear subspace, a smooth nonlinear sequence is not directly provided. Indeed, although supervised learning makes use of unlabeled data
manifold, or a collection of submanifolds). An- simplified versions of reinforcement learning to augment labeled data in a supervised learning
CREDIT: ISTOCK/AKINBOSTANCI
other example of dimension reduction is the known as bandit problems are studied, where it context, and discriminative training blends ar-
topic modeling framework depicted in Fig. 3. is assumed that rewards are provided after each chitectures developed for unsupervised learning
A criterion function is defined that embodies action, reinforcement learning problems typically with optimization formulations that make use
these assumptions—often making use of general involve a general control-theoretic setting in of labels. Model selection is the broad activity of
statistical principles such as maximum like- which the learning task is to learn a control strat- using training data not only to fit a model but
lihood, the method of moments, or Bayesian egy (a “policy”) for an agent acting in an unknown also to select from a family of models, and the
integration—and optimization or sampling algo- dynamical environment, where that learned strat- fact that training data do not directly indicate
ments and explicitly allow users to express and the product in some fashion (perhaps by purchas- machine learning remains a young field with
control trade-offs among resources. ing that item in the past). The machine-learning prob- many underexplored research opportunities.
As an example of resource constraints, let us lem is to suggest other items to a given user that he Some of these opportunities can be seen by con-
suppose that the data are provided by a set of or she may also be interested in, based on the data trasting current machine-learning approaches
individuals who wish to retain a degree of pri- across all users. to the types of learning we observe in naturally
GraphX
SparkR
us. Considerations such as these suggest that
Splash
Access and
Spark
Velox
MLPiplines
transformative technologies of the 21st century.
SparkSQL MLIib Processing Although it is impossible to predict the future, it
engine appears essential that society begin now to con-
Spark Core
sider how to maximize its benefits.
Succinct
Tachyon HDFS, S3, Ceph, … Storage REFERENCES
1. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical
Learning: Data Mining, Inference, and Prediction (Springer,
Resource New York, 2011).
Mesos Hadoop Yarn virtualization 2. K. Murphy, Machine Learning: A Probabilistic Perspective
(MIT Press, Cambridge, MA, 2012).
3. L. Valiant, Commun. ACM 27, 1134–1142 (1984).
AMPLab developed Spark Community 3rd party
4. V. Chandrasekaran, M. I. Jordan, Proc. Natl. Acad. Sci. U.S.A.
110, E1181–E1190 (2013).
Fig. 5. Data analytics stack. Scalable machine-learning systems are layered architectures that are 5. S. Decatur, O. Goldreich, D. Ron, SIAM J. Comput. 29, 854–879
built on parallel and distributed computing platforms. The architecture depicted here—an open- (2000).
6. S. Shalev-Shwartz, O. Shamir, E. Tromer, Using more data to
source data analysis stack developed in the Algorithms, Machines and People (AMP) Laboratory at
speed up training time, Proceedings of the Fifteenth Conference
hypotheses. Many theoretical results in machine take. Here, there is clearly a tension and trade-off Boston, 1998).
29. L. Wehbe et al., PLOS ONE 9, e112575 (2014).
learning apply to all learning systems, whether between personal privacy and public health, and 30. K. Xu et al., Proceedings of the 32nd International Conference
they are computer algorithms, animals, organ- society at large needs to make the decision on on Machine Learning, vol. 37, Lille, France,
izations, or natural evolution. As the field pro- how to make this trade-off. The larger point of 6 to 11 July 2015, pp. 2048–2057.
gresses, we may see machine-learning theory this example, however, is that, although the data 31. D. Blei, Commun. ACM 55, 77–84 (2012).
and algorithms increasingly providing models are already online, we do not currently have the
for understanding learning in neural systems, laws, customs, culture, or mechanisms to enable 10.1126/science.aaa8415
RELATED http://science.sciencemag.org/content/sci/349/6245/248.full
CONTENT
http://science.sciencemag.org/content/sci/349/6252/1064.3.full
REFERENCES This article cites 17 articles, 3 of which you can access for free
http://science.sciencemag.org/content/349/6245/255#BIBL
PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions
Science (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement of
Science, 1200 New York Avenue NW, Washington, DC 20005. 2017 © The Authors, some rights reserved; exclusive
licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. The title
Science is a registered trademark of AAAS.