Professional Documents
Culture Documents
This lecture Transactions on Robotics 2014, ”Data-Driven Grasp Synthesis – A survey" by Bohg et al.
”Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data ”Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping
Collection" by Levine et al. IJRR 2017. Sequences" by Mahler and Goldberg. CORL 2017.
https://berkeleyautomation.github.io/dex-net
Finger
Goal
Object
Controlling through contact
Better Models
Better Feedback
Predictive Model
Sensory Observations
Goal
Keep Predicting
Goal
Bias-Variance Tradeoff
Error
Min Error
Bias Variance
Model Complexity
Model-based
Bias-Variance Tradeoff
Error
Min Error
Bias Variance
Learning Structure
Min Error
Bias Variance
= Generalization
Hybrid Model Extrapolation
Sensory Observations
End-to-End
Loss on Effect
Learned Physics-based
Model Parameters Model
Predicted Effect
Action
Testing Hypothesis on a Case Study
Sensory Learned
Sensory
Observations Model
Observations
Learned Predicted
Model Effect
Physics-
Predicted
based
Action Action Parameters Effect
Model
Training: End-to-End
Loss: Error between Predicted and Ground Truth Effect
Advantages
Interpolation Extrapolation
Alina Kloss et al, “Combining learned and analytical models for predicting action effects,” IJRR 2020.
Generalization to new push velocities
Training Testing
Extrapolation
Generalization to new push velocities
Training Testing
Extrapolation
Generalization to new push velocities
Training Testing
Extrapolation
End-to-End
Loss on Effect
Learned Physics-based
Model Parameters Model
Predicted Effect
Action
Interpretability
Today’s itinerary
• Recap
• Learning-based Grasping
• Learning-based Approaches to
• Planar Pushing
• Contact-Rich Manipulation Tasks
• Interactive Perception
• Perception is not passive but active
• Not seeing, but looking
Force/Torque
Active Stereo
Joint Torque
Human multimodal sensor-motor coordination
Sensory
Inputs
Example of
Task-Relevant
pose geo force friction Information
47
Human multimodal sensor-motor coordination
Sensory
Inputs
Example of
Task-Relevant
pose geo force friction Information
π Action
48
Human multimodal sensor-motor coordination
Sensory
Inputs
Example of
Task-Relevant
pose geo force friction Information
π Action
49
Generalizable representations for multimodal inputs
50
Generalizable representations for multimodal inputs
𝜋(𝑓(𝑜! , 𝑜" , 𝑜# . . . 𝑜$ )) = 𝑎
multimodal sensory inputs
Learn representation …
learn policy
for new task instances
51
Experimental setup
Multimodal
sensory
inputs
Training Testing
Peg geometry
53
Learning generalizable representation
Inputs
RGB image
Encoder Decoder
Force data
RGB image
Fusion
Fusion Decoder
Force data
RGB image
pose force
Fusion
Fusion Decoder
Force data
geo friction
Normalized Force
1.0 x
y
0.5
z
Force
0
-0.5
-1.0
Time
57
Dynamics prediction from self-supervision
Inputs Decoders
Action-conditional
robot optical flow
action
RGB image
Encoder
Force data
Action-conditional
robot optical flow
action
RGB image
0/1
Encoder contact in
Force data the next step?
Action-conditional
robot optical flow
action
RGB image
0/1
Encoder contact in
Force data the next step?
Action-conditional
robot optical flow
action
RGB image
0/1
Encoder contact in
Force data the next step?
Action-conditional
robot optical flow
action
RGB image
0/1
Encoder contact in
Force data the next step?
Action-conditional
robot optical flow
action
RGB image
0/1
Encoder contact in
Force data the next step?
Action-conditional
robot optical flow
action
RGB image
0/1
Encoder contact in
Force data the next step?
RGB image
Encoder
Force data
Freeze
500k parameters
RL Policy
RGB image
Encoder
Force data
RL Policy
RGB image
Encoder
Force data
round x1
68
We efficiently learn policies in 5 hours
69
Our multimodal policy is robust against sensor noise
1 2
Force Camera
Perturbation Occlusion
70
Force Perturbation 1.0
Normalized Force
x
y
0.5
z
Force
0
-0.5
-1.0
Time
x1 71
Camera Occlusion
Agent View
x1 72
How is each modality used?
100
60
Image Only: Struggles with peg alignment
40
77 Force & Image: Can learn full task
20 49 completion
1
0
Force Image Force & Image
Simulation Results
(Randomized box location)
73
Does our representation generalize to new geometries?
7
4
Does our representation generalize to new geometries?
92% Success Rate
Tested on
Representation
Policy
7
5
Does our representation generalize to new policies?
92% Success Rate 62% Success Rate
Tested on Tested on
Representation Representation
Policy Policy
7
6
Does our representation generalize?
92% Success Rate 62% Success Rate 92% Success Rate
Lee et al. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal 78
Representations for Contact-Rich Tasks. ICRA’19. Best Paper Award.
Lessons Learned
Sensory Learned
Observations Model
Physics-
Predicted
based
Action Parameters Effect
Model
Today’s itinerary
• Recap
• Learning-based Grasping
• Learning-based Approaches to
• Planar Pushing
• Contact-Rich Manipulation Tasks
• Interactive Perception
• Perception is not passive but active
• Not seeing, but looking
Control
100
Success Rate
75
50
25
Selfsupervised Learning of a
Multimodal Representation
Action-conditional
robot optical flow
action
RGB image
0/1
Encoder contact in
Force data the next step?
Exploiting Selfsupervised
RGB, Depth and Learning of aInstance Segmentation
Motion for
Multimodal Representation
A Database and Evaluation Methodology for Optical Flow. Baker et al. IJCV. 2011 A Primal-Dual Framework for Real-Time Dense RGB-D Scene Flow. Jaimez at al. ICRA, 2015.
Unknown number
of rigid objects
Motion-based Object Segmentation based on Dense RGB-D Scene Flow. Shao et al. Submitted to RAL + IROS. 2018. Pre-print on arXiv.
Pixel-wise Prediction
Encoder
Structure
Motion-based Object Segmentation based on Dense RGB-D Scene Flow. Shao et al. Submitted to RAL + IROS. 2018. Pre-print on arXiv.
Computing Scene Flow
Scene Flow = Displacement of one 3D point over time
?!
SE(3)
Motion-based Object Segmentation based on Dense RGB-D Scene Flow. Shao et al. Submitted to RAL + IROS. 2018. Pre-print on arXiv.
Motion-based Object Segmentation
Projection of centroid
Trajectory feature
Clustering Clustering
Motion-based Object Segmentation based on Dense RGB-D Scene Flow. Shao et al. Submitted to RAL + IROS. 2018. Pre-print on arXiv.
Data Set of Frame Pairs
Time
Motion-based Object Segmentation based on Dense RGB-D Scene Flow. Shao et al. Submitted to RAL + IROS. 2018. Pre-print on arXiv.
Qualitative Results - Synthetic
Motion-based Object Segmentation based on Dense RGB-D Scene Flow. Shao et al. Submitted to RAL + IROS. 2018. Pre-print on arXiv.
Qualitative Results - Real
Motion-based Object Segmentation based on Dense RGB-D Scene Flow. Shao et al. Submitted to RAL + IROS. 2018. Pre-print on arXiv.
Interactive Perception