You are on page 1of 29

Machine Learning and Motion Planning

Dave Millman October 17, 2007

Machine Learning intro


Machine Learning (ML)
The study of algorithms which improve automatically though experience. - Mitchell

General description
Data driven Extract some information from data Mathematically based
Probability, Statistics, Information theory, Computational learning theory, optimization

A very small set of uses of ML


Text
Document labeling, Part of speech tagging, Summarization

Vision
Object recognition, Hand writing recognition, Emotion labeling, Surveillance

Sound
Speech recognition, music genra classification

Finance
Algorithmic trading

Medical, Biological, Chemical, on and on and on

A few types of ML
Supervised
Given: labeled data Usual goal: learn function Ex: SVM, Neural Networks, Boosting etc.

Unsupervised
Given: unlabeled data Usual goal: cluster data, learn conditional probabilities Ex: Nearest Neighbors, Decision trees

A few types of ML (cont.)


Semi-Supervised
Given: labeled and unlabeled data Usual goal: use unlabeled data to increase labeled data Ex: Cluster, Label unlabeled data from clusters

Reinforcement
Given: Reward function and set of actions Goal: Learn a function which optimizes the reward function Ex: Q-learning , Ant-Q

General Idea Supervised

General Idea Unsupervised

General Idea SemiSupervised

General Idea Reinforcement


Markov Decision Process (MDP)
State space (fully or partially observable) Action space (static or time dependant) Transition function produces an action (based the present state, not the past) Reward function (based on action)

Why regression is not enoughThe XOR problem

Text book Q-Learning [MI06]


Learning flocking behavior
N agents discrete time steps Agent i partner j Define Q-state Q(st, at) st - state ai - action

Our text book example


State of i
[R] = floor(|i-j|)

Actions for i
a1 - Attract to j a2 - Parallel positive orientation to j a3 - Parallel negative orientation to j a4 - Repulsion from j

Reward Function - no predator


Distances R1, R2, R3 s.t. R1 < R2 < R3
st r 0<[R]R1 1 -1 R1<[R]R2 1 -1 R2<[R]R3 1 -1 R3<[R] a1,a2,a3, a4 0

at a4 a1,a2,a3 a2 a1,a3,a4 a1 a2,a3,a4

Reward Function - predator


Distances R1, R2, R3 s.t. R1 < R2 < R3
st r 1 0<[R]R3 a1,a2,a3 -1 R3<[R] a1,a2,a3,a4 0 at a4

Dont repeat work!!


Basic planners work from scratch Ex, path planning for parking, no difference between first time and the hundredth time Ideal learn some general higher level strategies that can be reused General solution patterns in the problem space

Viability Filtering [KP07]


Agent can see, perceptual information
Range finder like virtual sensors

Data base of successfully perceptuallyparameterized motions


From its own experimentation or external source

Database exploited for future queries


Search based off of what has previously been successful in similar situations.

Sensors in Viability Filtering some defs


X set of agent states E set of environment states def x+ \in X+ {x+=(x,e) | x \in X, e \in E} = Sensor function (x+): X+ R At a specific sensor state x+ \in X+ def sensor state s =(1(x+), , n(x+)) And sensor space s \in S where S all sensor state values

Finally
Def locally situated state of the agent = (s,x) \in where x is some state information independent of the sensory agent. Now we want collect data to train a function (): {viable, nonviable} Note, errors in () could cause problems

Check Viability not Collision


Function IS_NONVIABLE(x+) if is_collision(x+) then return True s := (1(x+), , n(x+)) x := extract_internal_state(x+) := (s, x) return ():

Results and Further work

Bootstrapping Use of history to create macroscopic plans Model transfer

Training a Dog [B02]


MIT lab - System where the user interactively train the dog using click training Uses acoustic patterns as cues for actions Can be taught cues on different acoustic pattern Can create new actions from state space search Simplified Q-learning based on animal training techniques

Training a Dog (cont.)


Predictable regularities
animals will tend to successful state small time window

Maximize use of supervisor feedback


limit the state space by only looking at states that matter, ex if utterance u followed by action a produces a reward then utterance u is important.

Easy to train
Credit accumulation And allowing state action pair to delegate credit to another state action pain.

Alternatives to Q-Learning
Q-decomp [RZ03]
Complex agent as set of simpler subagents Subagent has its own reward function Arbitrator decides best actions based on advice from subagents

A simple world with initial state S0 and three terminal states SL , SU , SR , each with an associated reward of dollars and/or euros. The discount factor is (0, 1).

[fig from. RZ03]

Learning Behavior with Q-Decomp [CT06]


Q-Decomp as the learning technique Reward function - Inverse Reinforcement Learning (IRL) [NR00]
Mimicking behavior from an expert

Support Vector Path Planning


Idea that uses the SVM algorithm to generate a smooth path. Not really Machine learning but neat application of a ML algortihm Here is the idea

Support Vector Path Planning

Videos
Robot learning to pick up objects
http://www.cs.ou.edu/~fagg/movies/index.html#torso_2004

Training a Dog
http://characters.media.mit.edu/projects/dobie.html

References
[NR00] A. Y. Ng and S. Russell. Algorithms for inverse reinforcement learning. In Proc. 17th International Conf. on Machine Learning, pages 663-670. Morgan Kaufmann, San Francisco, CA, 2000. [B02] B. Blumberg et al. Integrated learning for interactive synthetic characters. In SIGGRAPH 02: Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 417-426, New York, NY, USA, 2002. ACM Press. [RZ03] S. J. Russell and A. Zimdars. Q-decomposition for reinforcement learning agents. In ICML, pages 656-663, 2003 [MI06] K. Morihiro, Teijiro Isokawa, Haruhiko Nishimura, Nobuyuki Matsui, Emergence of Flocking Behavior Based on Reinforcement Learning, Knowledge-Based Intelligent Information and Engineering Systems, pages 699706, 2006 [CT06] T. Conde and D. Thalmann. Learnable behavioural model for autonomous virtual agents: low-level learning. In AAMAS 06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, pages 89-96, New York, NY, USA, 2006. ACM Press. [M06] J. Miura. Support vector path planning. In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pages 2894-2899, 2006. [KP07] M. Kalisiak and M. van de Panne. Faster motion planning using learned local viability models. In ICRA, pages 2700-2705, 2007.

Machine Learning Ref


[M07] Mehryar Mohri - Foundations of Machine Learning course notes http://www.cs.nyu.edu/~mohri/ml07.html [M97] Tom M. Mitchell. Machine learning. McGraw-Hill, 1997 RN05] Russell S, Norvig P (1995) Artificial Intelligence: A Modern Approach, Prentice Hall Series in Artificial Intelligence. Englewood Cliffs, New Jersey [CV95] Corinna Cortes and Vladimir Vapnik, Support-Vector Networks, Machine Learning, 20, 1995. [V98] Vladimir N. Vapnik. Statistical Learning Theory. Wiley, 1998. [KV94] Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT Press, 1994.