You are on page 1of 45

+

CS178: Machine Learning and Data Mining

Introduction

Prof. Erik Sudderth

Some materials courtesy Alex Ihler & Sameer Singh


Artificial Intelligence (AI)
• Building “intelligent systems”
• Lots of parts to intelligent behavior

Darpa GC (Stanley)

RoboCup

Chess (Deep Blue v. Kasparov)


Machine learning (ML)
• One (important) part of AI
• Making predictions (or decisions)
• Getting better with experience (data)
• Problems whose solutions are “hard to describe”
CS178: Machine Learning & Data Mining

Math Programming

Statistics, Data Structures,


Probability, Algorithms,
Linear Algebra, Computational Complexity,
Optimization Data Management
Types of prediction problems
• Supervised learning
– “Labeled” training data
– Every example has a desired target value (a “best answer”)
– Reward prediction being close to target
– Classification: a discrete-valued prediction (often: action / decision)
– Regression: a continuous-valued prediction
Digit & Hand Gesture Recognition

Athitsos et al., CVPR 2004 & PAMI 2008


Microsoft Kinect Pose Estimation
Fig. 9. Empirical offset distributions for offset joint regression. We visualize the set Rlj of 3D relativ
u!j . Each
TRANS. PAMI, set ofFOR
SUBMITTED axis represents
REVIEW, 2012 a different leaf node, and the orange squares plot the vectors 5 u!
leaf. (The red, green, and blue squares indicate respectively the positive x, y, and z axes; each half-
0.5m in world space). We also show training images for each node illustrating the pixel that reached t

synthetic (train & test)


a cyan cross, and the offset vector as an orange arrow. Note how the decision trees tend to cluster pix
local appearance at the leaves, but the inherent remaining ambiguity results in multi-modal offset di

real (test)
OJR algorithm compresses these distributions to a very small number of modes while maintaining high

TRANS. PAMI, SUBMITTED FOR REVIEW, 2012

Fig. 2. Synthetic vs. real data. Pairs of depth images and correspondingtree 1
color-coded ground truth body part label
body part classification

images. The 3D body joint positions are also known (but not shown). Note wide variety in pose, shape, clothing, and
crop. The synthetic images look remarkably similar to the real images, lacking primarily just the high-frequency texture.
TRANS. PAMI, SUBMITTED FOR REVIEW, 2012

𝑙1 u
u
prediction model
Fig. 6. Randomized tree Decision Forests. A fore
𝑇 used, the predicti
ensemble of T decision trees. Each tree consists
…nodes (blue) and leaf nodes (green). pl (c) over body
The red
forest is used, th
indicate the different paths that might be taken by d
𝑙 𝑇 ufor a particular input. weighted relative
trees
we describe the
Fig. 3. A single body part varies widely in its Fig.
context. Shotton et al.,Forests.
PAMIA 2012 algorithms can b
6. Randomized Decision forest is an world space vote
ensemble of T decision trees. giveEacha large positive ofresponse
tree consists split infor pixels u n
nodes (blue) and top (green).
of the body,
The but value close tobody.
aarrows the zero Sec.
for p
Fig. 4. leaf nodes
Example base character red
models. aggregated in an
to learn a large degree of invariance. Let us run through lower down the body. By similar reasoning, the
indicate the different paths that might be taken by different based onfind
mean
the variations we simulate: trees for a particular input. feature ( 2 ) may be seen insteadproposals.
to help thins
Base Character. We use 3D models of 15 varied base structures such as the arm.
characters, both male and female, from child to adult, to the sensor, and ambient
The design illumination. To ensure
of these features high mo
was strongly
Climate Modeling
• Satellites measure sea-surface
temperature at sparse locations
Ø Noisy (atmosphere & sensors)
Ø Partial coverage of ocean surface
(satellite tracks)
Ø Sometimes hidden by clouds
• Would like to infer a dense
temperature field, and track
its temporal evolution
NASA Seasonal to Interannual Prediction Project
http://ct.gsfc.nasa.gov/annual.reports/ess98/nsipp.html
Types of prediction problems
• Supervised learning
• Unsupervised learning
– No known target values
– No targets = nothing to predict?
– Reward “patterns” or “explaining features”
– Often, data mining
serious?
Braveheart
The Color
Purple
Amadeus

Sense and Lethal Weapon


Sensibility
Ocean s 11

romance? action?

The Lion King Dumb and


Dumber
The Princess
Independence Day
Diaries

escapist?
Human Tumor Microarray Data
6 1. Introduction

SIDW299104
SIDW380102
SID73161

• 6830x64 matrix of real numbers


GNAL
H.sapiensmRNA
SID325394
RASGTPASE
SID207172
ESTs
SIDW377402
HumanmRNA

• Rows correspond to genes,


SIDW469884
ESTs
SID471915
MYBPROTO
ESTsChr.1
SID377451
DNAPOLYMER
SID375812
SIDW31489
SID167117

columns to tissue samples


SIDW470459
SIDW487261
Homosapiens
SIDW376586
Chr
MITOCHONDRIAL60
SID47116
ESTsChr.6

• Cluster rows (genes) to deduce


SIDW296310
SID488017
SID305167
ESTsChr.3
SID127504
SID289414
PTPRC
SIDW298203
SIDW310141

function of unknown genes from


SIDW376928
ESTsCh31
SID114241
SID377419
SID297117
SIDW201620
SIDW279664
SIDW510534
HLACLASSI

experimentally known genes


SIDW203464
SID239012
SIDW205716
SIDW376776
HYPOTHETICAL
WASWiskott
SIDW321854
ESTsChr.15

with similar profiles


SIDW376394
SID280066
ESTsChr.5
SIDW488221
SID46536
SIDW257915
ESTsChr.2
SIDW322806

• Cluster colunms (samples) to


SID200394
ESTsChr.15
SID284853
SID485148
SID297905
ESTs
SIDW486740
SMALLNUC
ESTs

hypothesize disease profiles


SIDW366311
SIDW357197
SID52979
ESTs
SID43609
SIDW416621
ERLUMEN
TUPLE1TUP1
SIDW428642
SID381079
SIDW298052
SIDW417270
SIDW362471
ESTsChr.15
SIDW321925
SID380265
SIDW308182
SID381508
SID377133
SIDW365099
ESTsChr.10
SIDW325120
SID360097
SID375990
SIDW128368
SID301902

Hastie, Tibshirani, & Friedman 2009


SID31984
SID42354

BREAST
RENAL
MELANOMA
MELANOMA
MCF7D-repro
COLON
COLON
K562B-repro
COLON
NSCLC
LEUKEMIA
RENAL
MELANOMA
BREAST
CNS
CNS
RENAL
MCF7A-repro
NSCLC
K562A-repro
COLON
CNS
NSCLC
NSCLC
LEUKEMIA
CNS
OVARIAN
BREAST
LEUKEMIA
MELANOMA
MELANOMA
OVARIAN
OVARIAN
NSCLC
RENAL
BREAST
MELANOMA
OVARIAN
OVARIAN
NSCLC
RENAL
BREAST
MELANOMA
LEUKEMIA
COLON
BREAST
LEUKEMIA
COLON
CNS
MELANOMA
NSCLC
PROSTATE
NSCLC
RENAL
RENAL
NSCLC
RENAL
LEUKEMIA
OVARIAN
PROSTATE
COLON
BREAST
RENAL
UNKNOWN
FIGURE 1.3. DNA microarray data: expression matrix of 6830 genes (rows)
and 64 samples (columns), for the human tumor data. Only a random sample
of 100 rows are shown. The display is a heat map, ranging from bright green
(negative, under expressed) to bright red (positive, over expressed). Missing values
are gray. The rows and columns are displayed in a randomly chosen order.
Social Networks & Relationships
William_P_Boardman
Thomas_L_Kempner_Jr
Walter_W_Driver_Jr
Scott_Kapnick

Robert_S_Kaplan William_M_Yarbenet
Nick_O_Donohoe
Ronald_Perelman Sallie_L_Krawcheck William_B_Harrison_Jr
Robert_B_Menschel
Sharmin_Mossavar_Rahmani
Walter_E_Massey Robert_J_Hurst
Timothy_M_George

Thomas_K_Montag
T_Timothy_Ryan_Jr Ron_Di_Russo William_Wicker
Robert_W_Scully Tom_Donohue
Rajat_K_Gupta Robert_L_Ryan
Richard_C_Perry
William_C_Dudley
Mark_Patterson
Peter_K_Scaturro William_H_Gray_III
Ronald_A_Williams Lloyd_C_Blankfein
Russell_L_Carson John_A_Thain Neel_Kashkari Reuben_Jeffery_III
Steven_T_Mnuchin William_R_Rhodes William_W_George
Stephen_M_Ross Marti_ThomasPete_Coneway
Judd_Alan_Gregg Willem_Buiter
Thomas_J_HealeyRaymond_J_McGuire
Rawan_Abdelrazek Rosemary_Zigrossi
Vikram_S_Pandit
Sanford_Weill
William_M_Lewis_Jr Robert_N_Hoglund
Sarah_G_Smith Meredith_Broome
Thomas_Steyer Robert_K_Steel Thomas_E_Tuft
Vartan_Gregorian Mario_Draghi Tracy_R_Wolstencroft
James_P_Gorman Robin_Jermyn_Brooks
Jamie_Dimon
Stephen_Schwarzman
Suzanne_Nora_Johnson Roger_C_Altman Thomas_R_Nides
Tom_Clausen Josh_Bolten
Jerry_Speyer
Leon_D_Black Kenneth_M_Jacobs
Terry_J_Lundgren Richard_Kauffman Stephen_Mandel
John_L_Thornton Ralph_L_Schlosstein
Robert_D_Hormats
Peter_Bass
Roger_W_Ferguson_Jr Stephen_Friedman
John_C_Whitehead
Paul_H_O_Neill
Theodore_Roosevelt_IV Wayne_L_Berman
Wilbur_Ross_JrRobert_L_Joss Timothy_C_Collins
Klaus_Kleinfeld Robert_Zoellick William_McDonough
Thomas_J_Wilson Robert_E_Rubin
Suresh_Sundaresan James_A_Johnson Michael_J_Cavanagh Tom_Daschle
Richard_D_Parsons
Ronald_Kirk Todd_Stern
Paula_H_Crown
Thomas_H_Kean Samuel_Bodman Sheryl_K_Sandberg
W_James_McNerney_Jr
Samuel_A_DiPiazza_Jr Glenn_H_Hutchins Reed_E_Hundt Ron_Moelis
Steven_Roth Nancy_KilleferReynold_Levy
Stanton_Anderson
Robert_M_Perkowitz
Maurice_R_Greenberg John_M_Deutch
Steven_Rattner Laura_D_Tyson
Stuart_Match_Suna Louis_B_Susman Zalmay_Khalilzad
William_C_Rudin Madeleine_K_Albright
Mark_T_Gallogly Peter_J_Solomon
Peter_Orszag Tim_Geithner
Wendy_Neu Richard_L__Jake__Siewert
Steven_A_Denning Michael_Froman
Richard_C_Holbrooke
Robert_Wolf Paul_Volcker Larry_Summers
Neal_S_WolinPeter_G_Peterson Ron_Bloom
Stuart_E_Eizenstat
Sam_Nunn Thomas_F_McLarty_III
Kenneth_M_Duberstein
Robert_S_McNamara Robert_D_Reischauer
Sylvia_Mathews_Burwell Robert_Boorstin
Tony_West
Thomas_P_Gerrity Maya_MacGuineas Peter_J_Wallison
Paul_J_Taubman Jason_Furman
Susan_Rice Michael_Lynton
Vernon_E_Jordan_Jr Martin_S_Feldstein
Thurgood_Marshall_Jr
Robert_H_Herz Scooter_Libby
Stephanie_Cutter
Warren_Christopher
William_A_Haseltine Shirley_Ann_Jackson Mark_B_McClellan
Strobe_Talbott
Roy_J_Bostock Valerie_Jarrett
Rebecca_M_Blank
Michael_H_Moskow
Wahid_Hamid Richard_N_Cooper
Penny_Pritzker
Thomas_E_Donilon
Richard_N_Haass Michael_Strautmanis
William_Donaldson Stephen_M_Wolf
Warren_Bruce_Rudman Vincent_A_Mai
Thomas_R_Pickering
William_Von_Hoene
Stephen_W_Bosworth William_S_Cohen
Tony_Fratto
William_E_Kennard
Robert_Bauer Robert_Gates
Victor_H_Fazio Melissa_E_Winter
Steven_Gluckstern
Robert_Lane_Gibbs
Norm_Eisen Vin_Weber
Richard_J_Danzig
Susan_Mandel Roger_B_Porter
Ronald_L_Olson
Tom_Bernstein Trenton_Arthur
Paul_Tudor_Jones_II
Wendy_Abrams Thomas_S_Foley

Kim et al., NIPS 2013


Types of prediction problems
• Supervised learning
• Unsupervised learning
• Semi-supervised learning
– Similar to supervised
– some data have unknown target values

• Ex: medical data


– Lots of patient data, few known outcomes
• Ex: image tagging
– Lots of images on Flikr, but only some of them tagged
Types of prediction problems
• Supervised learning
• Unsupervised learning
• Semi-supervised learning
• Reinforcement learning

• “Indirect” feedback on quality


– No answers, just “better” or “worse”
– Feedback may be delayed
Issues to Understand
ØGiven two candidate models, which is better?
§ Accuracy at predicting training data?
§ Complexity of classification or regression function?
§ Are all mistakes equally bad?
ØGiven a family of classifiers with free parameters,
which member of that family is best?
§ Are there general design principles? Probability &
§ What happens as I get more data? Statistics
§ Can I test all possible classifiers? Algorithms &
§ What if there are lots of parameters? Linear Algebra
Machine Learning

Introduction to Machine Learning

Course Logistics

Data and Visualization

Supervised Learning
CS178 Course Staff
Ø Instructor: Prof. Erik Sudderth
Research Interests: Statistical machine learning, computer vision, AI, …
Ø Teaching Assistants and Readers: Twaha Ibrahim, Daokun Jiang,
Zhenhan Li, Debora Sujono, Tamanna Hossain, Haoyu Ma
Course Information
Canvas page: https://canvas.eee.uci.edu/courses/24330
Course information, lecture slides, homework handouts.
Gradescope page: https://www.gradescope.com/courses/109546
Homework submission and grading.

Piazza page: http://piazza.com/uci/spring2020/cs178


For all questions and discussion of course material!
No required textbook:
Ø All necessary information covered in lectures.
Ø Recommended references on Canvas.
Video Lectures
Official Lecture Time: Tuesdays and Thursdays at 12:30pm Pacific

Ø Before Lecture: Watch linked YouTube videos reviewing new


technical concepts and methods.
Ø During Lecture: Prof. Sudderth will highlight key points,
discuss examples, and take questions.
Ø Viewing Live Lectures: Use a UCI zoom account (and login
with your UCInetID) to connect to the CS178 webinar:
https://uci.zoom.us/j/297577761
Ø After Lecture: Video recordings will be posted to the
CS178 Yuja channel (linked from Canvas)
Discussions

Python Notebooks
• Present demos Math Programming
• Questions about coding
• Hints for homeworks
• Led by TAs on Mondays starting April 6
• Video streaming and recording information
will be announced via Canvas
Homework Assignments
Homework 1 due April 16, released early next week.

5 Programming Assignments
• We will drop lowest grade

Objective
• Learn to apply ML techniques
• Submission is a “report”

Source Code (Python)


• Submit relevant code snippets
• We will not run it, but will read it
• Statement of collaboration, if any
• Only limited discussions allowed
Exams
Dates: April 21, May 5, May 19, June 2

4 Short Online Exams

• Taken online, mixture of multiple choice and free response


• Short length (30-45 minutes), but you will have a longer
(several hour) time window to start exam
• We will drop lowest grade, average other 3 equally
• More details about format and timing to be announced later
Content

• Test concepts covered in lecture


• Will also require computations related to
homework assignments and examples from lecture
Project

Groups for the Project


• Team size should be 3
• Larger teams not allowed
• Smaller in special cases,
with instructor permission
• More details coming later
(most work in second half of quarter)
• Report (two pages) due during
finals week (June 10)
Grading

Ø Homeworks: Best 4 scores out


Project of 5 homeworks.
20%
Ø Exams: Best 3 scores out
Homeworks
40% of 4 online exams. Must take on
announced dates:
April 21, May 5, May 19, June 2
Exams Ø Project: Enter a Kaggle ML
40%
competition in groups of 3,
report due during finals week.
Ø No midterm or final exam!
Machine Learning

Introduction to Machine Learning

Course Logistics

Data and Visualization

Supervised Learning
Data exploration
• Machine learning is a data science
– Look at the data; get a “feel” for what might work

• What types of data do we have?


– Binary values? (spam; gender; …)
– Categories? (home state; labels; …)
– Integer values? (1..5 stars; age brackets; …)
– (nearly) real values? (pixel intensity; prices; …)

• Are there missing data?

• “Shape” of the data? Outliers?


Scientific software
• Python
– Numpy, MatPlotLib, SciPy…

• Matlab
– Octave (free)

• R
– Used mainly in statistics

• C++
– For performance, not prototyping

• And other, more specialized languages for modeling…


Representing data
• Example: Fisher’s “Iris” data
http://en.wikipedia.org/wiki/Iris_flower_data_set

• Three different types of iris


– “Class”, y

• Four “features”, x1,…,x4


– Length & width of
sepals & petals

• 150 examples (data points)


Representing the data in Python
• Have m observations (data points)

• Each observation is a vector consisting of n features

• Often, represent this as a “data matrix”

import numpy as np # import numpy


iris = np.genfromtxt("data/iris.txt",delimiter=None)
X = iris[:,0:4] # load data and split into features, targets
Y = iris[:,4]
print X.shape # 150 data points; 4 features each
(150, 4)
Basic statistics
• Look at basic information about features
– Average value? (mean, median, etc.)
– “Spread”? (standard deviation, etc.)
– Maximum / Minimum values?

print np.mean(X, axis=0) # compute mean of each feature


[ 5.8433 3.0573 3.7580 1.1993 ]
print np.std(X, axis=0) #compute standard deviation of each feature
[ 0.8281 0.4359 1.7653 0.7622 ]
print np.max(X, axis=0) # largest value per feature
[ 7.9411 4.3632 6.8606 2.5236 ]
print np.min(X, axis=0) # smallest value per feature
[ 4.2985 1.9708 1.0331 0.0536 ]
Histograms
• Count the data falling in each of K bins
– “Summarize” data as a length-K vector of counts (& plot)
– Value of K determines “summarization”; depends on # of data
• K too big: every data point falls in its own bin; just “memorizes”
• K too small: all data in one or two bins; oversimplifies

% Histograms in MatPlotLib
import matplotlib.pyplot as plt
X1 = X[:,0] # extract first feature
Bins = np.linspace(4,8,17) # use explicit bin locations
plt.hist( X1, bins=Bins ) # generate the plot
Scatterplots
• Illustrate the relationship between two features

% Plotting in MatPlotLib
plt.plot(X[:,0], X[:,1], ’b.’); % plot data points as blue dots
Scatterplots
• For more than two features we can use a pair plot:
Supervised learning and targets
• Supervised learning: predict target values
• For discrete targets, often visualize with color

plt.hist( [X[Y==c,1] for c in np.unique(Y)] ,


bins=20, histtype='barstacked’)

colors = ['b','g','r']
for c in np.unique(Y):
plt.plot( X[Y==c,0], X[Y==c,1], 'o',
color=colors[int(c)] )
ml.histy(X[:,1], Y, bins=20)
Machine Learning

Introduction to Machine Learning

Course Logistics

Data and Visualization

Supervised Learning
How does machine learning work?
• Meta-programming
– Predict – apply rules to examples
– Score – get feedback on performance
– Learn – change predictor to do better
Learning algorithm

Change q
Program (“Learner”)
Improve performance
Characterized by
Training data some parameters q
(examples) “train”
Procedure (using q)
Features
that outputs a prediction
Feedback / “predict”
Target values Score performance
(“cost function”)
Supervised learning
• Notation
– Features x
– Targets y
– Predictions ŷ = f(x ; q)
– Parameters q Learning algorithm

Change q
Program (“Learner”)
Improve performance
Characterized by
Training data some parameters q
(examples) “train”
Procedure (using q)
Features
that outputs a prediction
Feedback / “predict”
Target values Score performance
(“cost function”)
Regression; Scatter plots
40
Target y

y(new) =?
20

x(new)
0
0 10 20
Feature x

• Suggests a relationship between x and y


• Prediction: new x, what is y?
Nearest neighbor regression
40
Target y

y(new) =?
20

x(new)
0
0 10 20
Feature x
• Find training datum x(i) closest to x(new)
Predict y(i)
Nearest neighbor regression
Predictor :
40
Given new features:
Target y

Find nearest example


Return its value

20

0
0 10 20
Feature x

• Defines a function f(x) implicitly


• Form is piecewise constant
Linear regression
Predictor :
40
Evaluate line:
Target y

return r
20

0
0 10 20
Feature x

• Define form of function f(x) explicitly


• Find a good f(x) within that family
Measuring error

Error or residual
Observation
Prediction

0
0 20
Regression vs. Classification

Regression Classification

y
y flatten

x x

Features x Features x x
Real-valued target y Discrete class c
(usually 0/1 or +1/-1 )
Predict continuous function ŷ(x) Predict discrete function ŷ(x)
Classification
X2 !

X1 !
Classification

All points where we decide 1 Decision Boundary


X2 !

All points where we decide -1

X1 !
Summary
• What is machine learning?
– Types of machine learning
– How machine learning works
• Supervised learning
– Training data: features x, targets y
• Regression
– (x,y) scatterplots; predictor outputs f(x)
• Classification
– (x,x) scatterplots
– Decision boundaries, colors & symbols

You might also like