You are on page 1of 35

Research Collabs

Project Exploration [TU Berlin]


Non-Euclidean Geometric Learning for 3D Vision,
Robotics and Science
Ameesh Makadia
makadia@google.com

Recent relevant publications:


CVPR’21, CVPR’21, CVPR’21, NeurIPS’20, NeurIPS’20, CVPR’20, ICML’19, ECCV’18, CVPR’18.
Group Equivariant Spherical CNNs Unsupervised 3D Learning
Generalized symmetry-reflecting convolutional models, with
applications in 3D Vision, Robotics, and Science
3D Shape, Appearance,
3D Pose 3D Keypoints
and Illumination
Zheng Xu (xuzheng@)
Google Research
Neural Networks for Federated Learning

● Optimization and generalization under realistic assumptions


● Practical algorithm with differential privacy
● Generalizable network design under various constraints
● Robustness against various adversary attacks

Federated learning Optimization Differential privacy


Dilip Krishnan (dilipkay@)
Google Research
Robust Perception

● Robustness of Neural models in vision and language


● Improved transfer learning, adversarial robustness, few-shot learning,
robust representation learning and generalization
● Relevant publications: dilipkay.wordpress.com
● Current explorations include: contrastive learning and transfer learning
Adel Ahmadyan
(ahmadyan@)
Objectron Deep 3D Generative Models Google Research

● Research focused on object-centric multi-view videos w/ pose annotations


● Disentangling Object's pose and shape embedding with generative models
● Github, Medipipe models, relevant publications (adel.ac/publication)
○ https://github.com/google-research-datasets/Objectron
○ https://google.github.io/mediapipe/solutions/objectron
Rico Jonschkowski
Robotics at Google
Structured ML for Robot Perception and Control / Google Brain

● Combine machine learning with structural assumptions


○ for learning perception skills that are relevant for robotics
○ for learning robot control
● Leverage perception for robot control in an integrated learning system

Manipulation Mobility
Boqing Gong
(bgong@google.com)
Label-efficient learning of visual models Google Research

● Current research
○ Making object classifiers & detectors robust against natural corruptions and
out-of-domain datasets
Related publications:
○ Efficient video recognition models http://boqinggong.info/publications.html
○ Visual relationship detection
○ Domain adaptation, multi-task/transfer learning, neural checkpoint ranking
○ Long-horizon, large-scale meta-learning

● Areas of interest
○ Adversarial and real-world robustness
○ Domain adaptation and multi-task/transfer learning
○ Vision + language
Googler name: Ofir Nachum
Research Topic: Reinforcement Learning, with a focus on how we can
use existing experience datasets to accelerate learning.

Current research: Various works in the areas of hierarchical RL, offline


RL, representation learning; e.g., Representation Matters: Offline
Pretraining for Sequential Decision Making, OPAL: Offline Primitive
Discovery for Accelerating Offline Reinforcement Learning

Areas of interest:
● All things RL, especially sub-topics mentioned above.
● My ideal result is finding methods and algorithms that are
theoretically grounded as well as have practical impact.
Alireza Fathi
3D Scene Understanding (alirezafathi@google.com)
Google Research

● 3D Scene Understanding (TensorFlow 3D)


○ 3D object detection and segmentation
○ 3D shape prediction
● Neural Rendering (e.g. OSF)
Googler name: Jianing Wei, Adel Ahmadyan, Liangkai Zhang, Google Research
Research Topic: 3D object geometric understanding

Current research: (external friendly, add publications)


● 3D object detection and tracking
● We released the Objectron dataset and models/solutions in MediaPipe
● Paper accepted to CVPR 2021

Areas of interest: (external friendly)


● 3D object detection and tracking
● 3D shape understanding
● neural representations
Less reliance on human labels for structured image/video
understanding
Jonathan Huang (jonathanhuang@)
Google Research

● Strong generalization (e.g., test on


classes that are not seen during
training)

● How to leverage videos for static


image perception (e.g. as a source of
self-supervision)

● Weak or noisy labels (e.g. use


bounding box annotations to learn to Recent work: we show that architectures that perform
predict instance segmentation masks) similarly when fully supervised generalize in dramatically
different ways to classes not seen at training time (in this
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
case, parking meters, phones and pizzas)
Googler name: Martin Bruse
Research Topic: Human acoustics perception

Current research: (external friendly, add publications)


● Dissonance and aural harmonics
● Perceptual loudness of partially masked sounds

Areas of interest: (external friendly)


● Audio perceptual metrics
● Sound source localization
● Sound source separation
● Wave front modeling
Martin Bruse
Google Research
Human acoustics perception

● Understand human perception of soundscapes


● Reproduce human acoustics models and fit them to listening tests
● Improve the state of media and remote communication

Design and execute Create new models Improve audio comfort


listening tests and quality
Matthieu Geist
Reinforcement and Apprenticeship Learning

Current research:
● RL: theoretically grounded and efficient agents
● AL: classic imitation learning, and less standard settings (eg inverse RL from suboptimal
but improving demonstrations, exploration from demonstrations)
● Game theory, esp. Mean Field Games
● Field robotics (eg, navigation in natural environments)

Areas of interest:
● RL, AL, Game theory
● Theoretically sound approaches leading to practical and efficient agents
● Practical applications (esp field robotics)

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Ibrahim Alabdulmohsin
ibomohsin@
Cross-Architecture Transfer Google Brain, Zürich

● Develop algorithms for cross-architecture transfer learning (e.g. from pretrained


MobileNet to ResNet152). How do changes in architecture matter (depth, width,
resolution, skip connections, etc)? Any fundamental limits to what can be transferred
across different architectures without access to data?

● Why?
○ Accelerate experimentation, architecture sweep, distillation, etc.
○ Improve understanding of how deep neural networks work.
● Examples of Evidence: Distillation and pretraining with random labels
Efi Kokiopoulou
Robust deep classification under noisy labels (kokiopou@)
Google Research
Current research:
● Train deep classifiers to be robust against
input-dependent label noise
● Take class correlations in label noise into account
● Add domain-knowledge to the noise model

Areas of interest: "Correlated Input-Dependent Label Noise in


Large-Scale Image Classification", M. Collier, B.
● Aleatoric uncertainty in deep models Mustafa, E. Kokiopoulou, R. Jenatton, J. Berent,
CVPR 2021, accepted, oral presentation.
● Probabilistic ML
Matthew Brown (mtbr@)
Google Research
3D Object Category Modelling in the Wild

● Learn 3D Category Models from Real-World Datasets, e.g., Objectron


● Build on recent progress in Neural Radiance Fields (NeRF)
● Applications: Visual Perception in 3D, Scene Understanding,, Graphics
Andy Zeng (andyzeng@)
Robotics at Google
ML for Robot Perception of Deformables

● Goal: learn perception* of deformable objects for robot manipulation


● Challenges: self-occlusions, complex dynamics, large configuration spaces

Example teleoperation

How to get a robot to fold a T-shirt?


*can include vision, force/torque, tactile, etc.
Josh Caldwell
Intelligent feedback in Scratch (@joshcaldwell)
EngEDU

● Analyze Scratch project snapshots to identify target behaviors


● Gain insight into student learning without restricting creative expression
● Contextualize and surface learning insights to educators

Related work:
Pensieve, Pyramid Snapshot Challenge,
Zero-Shot Learning - Chris Piech, Stanford
iSnap, SourceCheck - Thomas Price, NC State
Dr. Scratch -Jesús Moreno-León, URJC
Googler name: Caroline Pantofaru (cpantofaru@) & Michael Nechyba* (mnechyba@)
Research Topic: People-centric perception

Areas of interest:
● Fairness (see next slide as well)
● Modeling - detection, tracking, diarization, pose, etc.
● Context: people in video, media, robotics/ambient and HCI
● Synthetic data augmentation
● Metrics
Googler name: Susanna Ricco* (ricco@), Caroline Pantofaru (cpantofaru@)
Research Topic: Computer Vision - Fairness

Current research:
● Understanding bias propagation under partial/weak supervision or distillation.
● Learning approaches to mitigate bias propagation.

Areas of interest:
● Partial / weak supervision
● Dataset design
● Fairness
○ Metrics (including beyond group fairness)
○ Interventions (including beyond classifiers)

Example of partial annotation of person bounding


boxes in Open Images. Magenta boxes were
annotated. Yellow boxes are correct detections
missing in the ground truth partial annotations.
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Googler name: Sourish Chaudhuri (sourc@google.com)
Research Topic: Multimodal Modeling for Video

Current research: (external friendly)


1. Joint modeling of audio and video
2. Doing (1) efficiently
3. Multi-task modeling, and cross-modal learning

Publications and Links: (external friendly)


- Putting a face to the voice
- AVA Active Speaker dataset
- AVA Speech Activity dataset
- AVA Active Speaker challenge @CVPR

Areas of interest: (external friendly)


- Video understanding
- Speaker diarization
- Conversation analysis
Conversation analysis for dynamic video scenes to enable contextual awareness.
Source: https://arxiv.org/abs/1901.01342
Bounding box color indicates whether someone is speaking: red for not speaking, green for speaking,
yellow for speaking but not audible.

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
AutoML for MIP (mixed integer programming) Pawel Lichocki
(pawell@)
Operations Research

Context
1/ There are a lot of MIP heuristics, e.g., randomized rounding, round
feasibility pump, pivot-and-shift, fix-and-propagate. pivot shift
2/ The heuristics iterate over LP-feasible or integral "solutions" LP
integral
in hope to stumble upon LP-feasible and integral solution. feasible

3/ The heuristics multiplex (some of) the four basic operations:


round, shift, pump and pivot (in different orders and flavours). pump

Initial idea for i = 1..N


if Cost(i) > 1

1/ Use evolutionary computation (or related methods) to automatically round(x[i])


propagate(i)

find primal heuristic tailored towards a family of MIP instances.


pump(x)
replace ...
mutate
2/ The challenge is design genetic encoding (capable of expressing versatile
heuristics) and search operators (capable of exploring heuristic space). for i = 1..N
if Frac(x[i])

select
for i = 1..N
round(x[i]) for i = 1..N
if Frac(x[i])
propagate(i) if Frac(x[i])
for i = 1..N round(x[i])
pump(x) round(x[i])
if Frac(x[i])
propagate(i) propagate(i)
...
pump(x) round(x[i]) pump(x)
... propagate(i) ...
pump(x)
...
ML to design and control robots that can go anywhere
Googler: Tingnan Zhang (tingnan@)
● Motivation: we want our robots to be able to move on all
terrains: as fast as cars on paved roads, and as elegant as
animals on complex natural surfaces like sand, snow, grass
and mud.
● Goal: Developed learning system to automatically design
robot morphologies and figure out control policies for
complex terrains.
● Challenges:
○ Little known principles and priors to guide the
design.
○ Sim-to-real transfer to control policies.
○ Types of natural substrates are large, and their
properties are diverse (i.e. loose and flowable).
Learning Robot Locomotion from Videos
Googler: Wenhao Yu (magicmelon@)

● Imitation learning from animals leads to efficient learning and natural motion.
● Abundant of videos exist but motion-capture data is scarce.
● Learning coordinated locomotion from unstructured videos present unique challenges.

Learning from motion-capture data Learning from videos

[Learning Agile Robotic Locomotion Skills by Imitating Animals]


Bo Dai, Hanjun Dai
({bodai,hadai}@google.com)
Data-driven Algorithm Design Google Brain

● Current research
○ Automatic decision making: reinforcement learning (NeurIPS 19, ICML 20a, ICLR 20,
NeurIPS 20a, NeurIPS 20b), optimization (NeurIPS 20c, AISTATS 21)
○ Learnable algorithm design: search (ICML 17, NeurIPS 20d, NeurIPS 20e), sampling
(AISTATS 19, NeurIPS 19), planning (ICLR 20, ICML 20b)
● Areas of interest
○ Ultimate goal: make the intractable (approximately) tractable
○ Foundation for algorithm design:
■ Reinforcement learning, learning to search/optimize
○ Application domains including:
■ Program/software understanding
■ Knowledge graph reasoning
■ Supply chain management
■ Science discovery
Robustness in Recommendation
alexbeutel@

● Current research
○ Robustness in Recommendation
○ Safe Multi-Objective RL for Recommendation
○ Fairness in Recommendation
● Areas of interest
○ How do we ensure recommender systems aren’t brittle and vulnerable to spurious correlations?
○ How should we make use of uncertainty in recommendation?
○ How can we make recommenders robust to adversarial attacks?
Karthik Raman
(karthikraman@)
Multilingual and Cross-Lingual learning Omniglot, Google Research

● Learning powerful multilingual / language-agnostic representations


● Advancing cross-lingual NLP (incl. generation), IR and transfer learning
● Leveraging weak forms of supervision for multilingual learning.

Multilingual, Multimodal Language-agnostic Cross-lingual IR


(e.g. using WIT) representations
Googler name: Been Kim
Research Topic: Understanding and Interpreting neural networks
Current research:
Developing layperson-focused interpretability methods
● Layperson friendly interpretability methods: concept-based explanation (others: 1, 2, 3)
● Discovering new concepts for explanation
Thinking about limitations of current interpretability methods methods
● Sanity checks for saliency maps

Areas of interest:
● Use Interpretability as a microscope on scientific
phenomena modeled by complex ML models to
discover something humans never knew before.
● Developing ways to detect limitations of
interpretability methods

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Qifei Wang
(qfwang@google.com)
Self-supervised Multi-task Learning Google Research

● Current research
○ Model unification via multi-task learning (WACV 21)
○ Multi-domain learning and domain generalization (CVPR 21)
○ Multi-modal learning for video understanding

● Areas of interest
○ Multi-task learning and multi-domain learning
○ Self-supervised and unsupervised learning
○ Few-shot learning
○ Multi-modal learning for ambient sensing
What do generative models understand? Alexey Dosovitskiy
(adosovitskiy@google.com)
Brain Berlin
● Current research
○ Object-centric models (Slot Attention)
○ Image generation (NeRF in the Wild)
○ Architectures for computer vision (Vision Transformer)

● Project idea (mildly speculative)


○ Modern generative models are getting really good (StyleGAN-1/2, BigGAN, DALL-E, etc)
○ Seems they “understand something about the world”. Some of them have been shown
to be good at recognition (iGPT)
○ What are good ways to extract knowledge useful for recognition from generative
models? Fine-tuning? Analysis by synthesis? Something smarter?
○ Can/should we modify the architecture of a generative model to make it work better
for recognition? (e.g. object-centric biases)
○ What benefits can a generatively trained model offer over discriminatively trained
models? Perhaps generalization/generality?
Di Wang (wadi@)
New Primitives for Learning on Large Graphs Google Research

Relevant research:
Combinatorial and non-linear diffusion on graphs, (dynamic) Graph decomposition,
Packing / Covering problems (min-cost flow, Wasserstein distance), Preconditioning
and numerical primitives.

Areas of interest:
Theoretical and empirical study of non-linear diffusion
Sparsifying/sketching graphs maintaining structures such as random walks and clustering
Graph-based semi-supervised learning
Jeremiah Harmsen
Google Brain
Systems and Tools for Machine Learning jeremiah@google.com

● Develop high-performance input pipelines to feed accelerators


● Design intelligent, global storage systems optimized for ML workloads
● Accelerate research through artifact & experiment tracking,
reproducibility, scaling, etc...

🌍
Global ML TensorFlow Research
Storage Systems Datasets Velocity
Googler name: Yasemin Altun altun@google.com
Research Topic: Structure Aware Machine Learning for NLU

Current research: Reasoning over structured and semi-structure context (knowledge


graphs, tables, personal context) along with natural language text for NLU problems
(semantic parsing, question answering, table parsing, entailment, retrieval)

Areas of interest:
● (Conversational) knowledge-based question answering
● Task-oriented dialogue
● Reasoning over structured and semi-structured context

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Human presence detection around machinery Stefan Welker (swelker@)
Robotics at Google

● Current Research: Robot UX, Robotic Manipulation

○ Detect human presence around collaborative robot to increase safety


○ Cameras are set up to solve robot tasks, not to detect humans specifically.
○ obvious features like faces, hands may not be visible, or are sensitive information.
○ Observation are usually incomplete, and from uncommon angles,
(overhead, sideways)

You might also like