Professional Documents
Culture Documents
652 Part I
653
655
11
c
Draft chapter (February 13, 2019) from “Mathematics for Machine Learning”
2019 by Marc
Peter Deisenroth, A Aldo Faisal, and Cheng Soon Ong. To be published by Cambridge University
Press. Report errata and feedback to https://mml-book.com. Please do not post or distribute
this file, please link to https://mml-book.com.
656
657
1
664 Since machine learning is inherently data driven, data is at the core data
665 of machine learning. The goal of machine learning is to design general-
666 purpose methodologies to extract valuable patterns from data, ideally
667 without much domain-specific expertise. For example, given a large corpus
668 of documents (e.g., books in many libraries), machine learning methods
669 can be used to automatically find relevant topics that are shared across
670 documents (Hoffman et al., 2010). To achieve this goal, we design mod-
671 els that are typically related to the process that generates data, similar to model
672 the dataset we are given. For example, in a regression setting, the model
673 would describe a function that maps inputs to real-valued outputs. To
674 paraphrase Mitchell (1997): A model is said to learn from data if its per-
675 formance on a given task improves after the data is taken into account.
676 The goal is to find good models that generalize well to yet unseen data,
677 which we may care about in the future. Learning can be understood as a learning
678 way to automatically find patterns and structure in data by optimizing the
679 parameters of the model.
680 While machine learning has seen many success stories, and software is
681 readily available to design and train rich and flexible machine learning
682 systems, we believe that the mathematical foundations of machine learn-
683 ing are important in order to understand fundamental principles upon
684 which more complicated machine learning systems are built. Understand-
685 ing these principles can facilitate creating new machine learning solutions,
686 understanding and debugging existing approaches and learning about the
687 inherent assumptions and limitations of the methodologies we are work-
688 ing with.
13
c
Draft chapter (February 13, 2019) from “Mathematics for Machine Learning”
2019 by Marc
Peter Deisenroth, A Aldo Faisal, and Cheng Soon Ong. To be published by Cambridge University
Press. Report errata and feedback to https://mml-book.com. Please do not post or distribute
this file, please link to https://mml-book.com.
14 Introduction and Motivation
Draft (2019-02-13) from Mathematics for Machine Learning. Errata and feedback to https://mml-book.com.
1.2 Two Ways to Read this Book 15
733 the model to perform well on unseen data. Performing well on data that
734 we have already seen (training data) may only mean that we found a
735 good way to memorize the data. However, this may not generalize well to
736 unseen data, and, in practical applications, we often need to expose our
737 machine learning system to situations that it has not encountered before.
738 Let us summarize the main concepts of machine learning that we cover
739 in this book:
740 • We represent data as vectors.
741 • We choose an appropriate model, either using the probabilisitic or opti-
742 mization view.
743 • We learn from available data by using numerical optimization methods
744 with the aim that the model performs well on data not used for training.
c
2018 Marc Peter Deisenroth, A. Aldo Faisal, Cheng Soon Ong. To be published by Cambridge University Press.
16 Introduction and Motivation
Dimensionality
Classification
Reduction
Regression
Estimation
Density
Vector Calculus Probability & Distributions Optimization
Linear Algebra Analytic Geometry Matrix Decomposition
774 Of course there are more than two ways to read this book. Most readers
775 learn using a combination of top-down and bottom-up approaches, some-
776 times building up basic mathematical skills before attempting more com-
777 plex concepts, but also choosing topics based on applications of machine
778 learning.
Draft (2019-02-13) from Mathematics for Machine Learning. Errata and feedback to https://mml-book.com.
1.2 Two Ways to Read this Book 17
804 point. Quantification of uncertainty is the realm of probability theory and probability theory
805 is covered in Chapter 6.
806 To train machine learning models, we typically find parameters that
807 maximize some performance measure. Many optimization techniques re-
808 quire the concept of a gradient, which tells us the direction in which to
809 search for a solution. Chapter 5 is about vector calculus and details the vector calculus
810 concept of gradients, which we subsequently use in Chapter 7, where we
811 talk about optimization to find maxima/minima of functions. optimization
c
2018 Marc Peter Deisenroth, A. Aldo Faisal, Cheng Soon Ong. To be published by Cambridge University Press.
18 Introduction and Motivation
Draft (2019-02-13) from Mathematics for Machine Learning. Errata and feedback to https://mml-book.com.