0% found this document useful (0 votes)

18 views4 pages

Bentoml

BentoML is seeking an Inference Optimization Engineer to enhance the performance of large language models through GPU kernel-level optimizations and distributed architectures. The role involves profiling workloads, optimizing inference efficiency, and contributing to open-source projects, with a salary range of $200k - $300k. The position is remote and requires experience with inference engines and optimization techniques for transformer-based models.

Uploaded by

Maxwell john

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views4 pages

Bentoml

Uploaded by

Maxwell john

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

BentoML

Inference Optimization Engineer

Full-time Remote North America Asia Europe $200k - $300k

About this role

Role:
As an Inference Optimization Engineer, you will improve the speed and efficiency
of large language models at the GPU kernel level, through the inference engine,
and across distributed architectures. You will profile real workloads, remove
bottlenecks, and lift each layer of the stack to new performance ceilings. Every
gain you unlock will flow straight into open source code and power fleets of
production models, cutting GPU costs for teams around the world. By publishing
blog posts and giving conference talks you will become a trusted voice on
efficient LLM inference at large scale.

Example projects:
https://bentoml.com/blog/structured-decoding-in-vllm-a-gentle-introduction
https://www.bentoml.com/blog/benchmarking-llm-inference-backends
https://bentoml.com/blog/25x-faster-cold-starts-for-llms-on-kubernetes

Responsibilities:
Latency & throughput: Identify bottlenecks and optimize inference efficiency in
single-GPU, multi-GPU, and multi-node serving setups.
Benchmarking: Build repeatable tests that model production traffic; track and report
vLLM, SGLang, TRT-LLM, and future runtimes.
Resource efficiency: Reduce memory use and compute cost with mixed precision,
better KV-cache handling, quantization, and speculative decoding.
Serving features: Improve batching, caching, load balancing, and model-parallel
execution.
Knowledge sharing: Write technical posts, contribute code, and present findings to
the open-source community.

Qualifications:
Deep understanding of transformer architecture and inference engine internals.
Hands-on experience speeding up model serving through batching, caching, load
balancing.
Experienced with inference engines such as vLLM, SGLang, or TRT-LLM (upstream
contributions are a plus).
Experienced with inference optimization techniques: quantization, distillation,
speculative decoding, or similar.
Proficiency in CUDA and use of profiling tools like Nsight, nvprof, or CUPTI.
Proficiency in Triton and ROCm is a bonus.
Track record of blog posts, conference talks, or open-source projects in ML systems
is a bonus.

Why join us:

Direct impact – optimize distributed LLM inference and large GPU clusters
worldwide and cut real GPU costs.
Technical scope – operate distributed LLM inference and large GPU clusters
worldwide.
Customer reach – support organizations around the globe that rely on BentoML.
Influence – mentor teammates, guide open-source contributors, and become a go-
to voice on efficient inference in the community.
Remote work – work from where you are most productive and collaborate with
teammates in North America and Asia.
Compensation – competitive salary, equity, learning budget, and paid conference
travel.

1+ years of experience
working with inference engines or inference optimization techniques for
transformer based models

Salary
$200k - $300k

Equity
1.0-2.0%

Remote work policy

Remote from anywhere in the world

Full-time position

Location
North America, Asia, Europe

Report to
https://www.linkedin.com/in/ssheng/

Tech stack
python, CRUD

About BentoML
BentoML is an enterprise-grade InferenceOps platform for deploying and managing AI
models at scale. It offers full control without the complexity, allowing teams to serve
any model including LLMs, embeddings, and agentic pipelines across VPC, on-prem,
or hybrid environments with tailored optimization, advanced orchestration, and fine-
grained performance tuning.

From prototype to production, BentoML covers the full inference lifecycle with instant
model deployments, elastic autoscaling, built-in observability, compliance-ready
features, and mission-critical reliability, freeing your team to deliver AI that drives real
business outcomes faster.

Team size Founded

15 people 2019

Website LinkedIn
www.bentoml.com Visit

Company locations
San Francisco, California

About the team

Chaoyu Yang: Founder & CEO
Sean Sheng: Head of Engineering

Tech stack
python, CRUD

Interview process
1 Initial Screen (1 hour)

2 Virtual Onsite (4 Hours)

Paraform

Terms
Privacy

LLM Expert
No ratings yet
LLM Expert
3 pages
FAANGPath Simple Template 2
No ratings yet
FAANGPath Simple Template 2
2 pages
SB Intuitions Research Engineer JD
No ratings yet
SB Intuitions Research Engineer JD
2 pages
ML Engineer
No ratings yet
ML Engineer
2 pages
Juan Jose Alban CV 2025
No ratings yet
Juan Jose Alban CV 2025
8 pages
ML LLM Eng JD
No ratings yet
ML LLM Eng JD
3 pages
Machine Learning Scientist at FL97
No ratings yet
Machine Learning Scientist at FL97
2 pages
FAANGPath Simple Template 2
No ratings yet
FAANGPath Simple Template 2
2 pages
Ec058fec 0562 4613 9e13 59e - HRfFo9Z
No ratings yet
Ec058fec 0562 4613 9e13 59e - HRfFo9Z
2 pages
JD For Ai and ML
No ratings yet
JD For Ai and ML
1 page
Deploying The Best Model in A Few Minutes With Bentoml and Mlflow
No ratings yet
Deploying The Best Model in A Few Minutes With Bentoml and Mlflow
11 pages
Role
No ratings yet
Role
4 pages
Senior MLOPS Engineer - JD
No ratings yet
Senior MLOPS Engineer - JD
2 pages
Work Remotely With Turing To Advance AI
No ratings yet
Work Remotely With Turing To Advance AI
3 pages
Software Engineer - Gen AI
No ratings yet
Software Engineer - Gen AI
3 pages
Senior Machine Learning Engineer Role
No ratings yet
Senior Machine Learning Engineer Role
1 page
Entry-Level Machine Learning Engineer Job
No ratings yet
Entry-Level Machine Learning Engineer Job
2 pages
JD - Machine Learning Engineer
No ratings yet
JD - Machine Learning Engineer
1 page
Resume Oloyede Iremide AI Engineer Lts
No ratings yet
Resume Oloyede Iremide AI Engineer Lts
1 page
Machine Learning Engineer Job Overview
No ratings yet
Machine Learning Engineer Job Overview
2 pages
LLM - Python For Machine Learning
No ratings yet
LLM - Python For Machine Learning
2 pages
Senior AI Engineer - Computer Vision
No ratings yet
Senior AI Engineer - Computer Vision
2 pages
Machine Learning Researcher
No ratings yet
Machine Learning Researcher
2 pages
Trainee Machine Learning Engineer Job in Trivandrum/Kochi
No ratings yet
Trainee Machine Learning Engineer Job in Trivandrum/Kochi
2 pages
AI ML Engineer
No ratings yet
AI ML Engineer
2 pages
Job Description - AI ML LLM Developer
No ratings yet
Job Description - AI ML LLM Developer
1 page
Machine Learning Graph Engineer JD - 2025
No ratings yet
Machine Learning Graph Engineer JD - 2025
2 pages
Updated JD - Data Science Intern at Ai Palette
No ratings yet
Updated JD - Data Science Intern at Ai Palette
2 pages
Sumeru Digital Solutions Campus Drive 2025
No ratings yet
Sumeru Digital Solutions Campus Drive 2025
2 pages
Senior ML Ops Engineer Job Houston
No ratings yet
Senior ML Ops Engineer Job Houston
2 pages
AI Engineer - JD
No ratings yet
AI Engineer - JD
3 pages
JD - Software Engineer
No ratings yet
JD - Software Engineer
1 page
FAANGPath Simple Template 2
No ratings yet
FAANGPath Simple Template 2
2 pages
Remote Applied ML Researcher Role
No ratings yet
Remote Applied ML Researcher Role
3 pages
Junior Machine Learning Engineer Role in Tokyo
No ratings yet
Junior Machine Learning Engineer Role in Tokyo
2 pages
Job Descriptiom-AIML-Offshore
No ratings yet
Job Descriptiom-AIML-Offshore
2 pages
Large-Scale Deep Learning with TensorFlow
No ratings yet
Large-Scale Deep Learning with TensorFlow
119 pages
Israel Garcia (CV)
No ratings yet
Israel Garcia (CV)
5 pages
JD Data Science Intern ICS
No ratings yet
JD Data Science Intern ICS
3 pages
Sai - AIML Engineer - Vertex AI
No ratings yet
Sai - AIML Engineer - Vertex AI
2 pages
Job Description
No ratings yet
Job Description
2 pages
FAANGPath Simple Template 2
No ratings yet
FAANGPath Simple Template 2
2 pages
Machine Learning Engineer-1 JD
No ratings yet
Machine Learning Engineer-1 JD
4 pages
SrijanAI Innovations - Junior AI Engineer
No ratings yet
SrijanAI Innovations - Junior AI Engineer
2 pages
JD - Deep Learning Engineer
No ratings yet
JD - Deep Learning Engineer
2 pages
Sr. ML Engineer Job at Biocube
No ratings yet
Sr. ML Engineer Job at Biocube
4 pages
Job Description
No ratings yet
Job Description
2 pages
Onboardly - MLE
No ratings yet
Onboardly - MLE
4 pages
Walmart Prompt Engineering Position
No ratings yet
Walmart Prompt Engineering Position
2 pages
Ai - ML JD
No ratings yet
Ai - ML JD
2 pages
Vcodes - JD-Machine Learning Engineer
No ratings yet
Vcodes - JD-Machine Learning Engineer
1 page
FAANGPath Simple Template 2
No ratings yet
FAANGPath Simple Template 2
2 pages
FAANGPath Simple Template 2
No ratings yet
FAANGPath Simple Template 2
2 pages
FAANGPath Simple Template 2
No ratings yet
FAANGPath Simple Template 2
2 pages
Irfan Shah ML
No ratings yet
Irfan Shah ML
2 pages
MLops 12 Draft
No ratings yet
MLops 12 Draft
5 pages
FAANGPath Simple Template 2
No ratings yet
FAANGPath Simple Template 2
2 pages
Machine Learning Engineer Job Opening
No ratings yet
Machine Learning Engineer Job Opening
2 pages
Chapter 4-Software Project Management
No ratings yet
Chapter 4-Software Project Management
38 pages
Machine Records E35
No ratings yet
Machine Records E35
1 page
8, Transformers, 26307-20 Exam
No ratings yet
8, Transformers, 26307-20 Exam
4 pages
Piper PA-46-350P Inspection Report 767-011 - February 2020
100% (2)
Piper PA-46-350P Inspection Report 767-011 - February 2020
6 pages
HMMT February 2024 Team Round Problems
No ratings yet
HMMT February 2024 Team Round Problems
1 page
MR366 Megane 0 - General Information
No ratings yet
MR366 Megane 0 - General Information
37 pages
Cozy GWH18-24-28k Dreds Technical Specifications
No ratings yet
Cozy GWH18-24-28k Dreds Technical Specifications
19 pages
Supportive Psychotherapy in the 21st Century
No ratings yet
Supportive Psychotherapy in the 21st Century
16 pages
Features of Discourse
No ratings yet
Features of Discourse
5 pages
Top 40 Librarian Interview Questions
100% (1)
Top 40 Librarian Interview Questions
10 pages
ZProject-Brush Tutorial for ZBrush3
No ratings yet
ZProject-Brush Tutorial for ZBrush3
25 pages
Trihal VHE Transformers Specifications
No ratings yet
Trihal VHE Transformers Specifications
9 pages
Subject: Quotation For 4nos. Silo Vessel Fabrication Work
No ratings yet
Subject: Quotation For 4nos. Silo Vessel Fabrication Work
9 pages
Soueast dx3
No ratings yet
Soueast dx3
2 pages
Owners Mindset
No ratings yet
Owners Mindset
4 pages
1a2. - Design Principles Ce134p-2 - Escruz
No ratings yet
1a2. - Design Principles Ce134p-2 - Escruz
12 pages
Spss v12 Data Analysis
No ratings yet
Spss v12 Data Analysis
17 pages
F&F Policy
No ratings yet
F&F Policy
1 page
Jindal Power Limited O.P.J.S.T.P.P, Tamnar, Raigarh: Commissioning Department
No ratings yet
Jindal Power Limited O.P.J.S.T.P.P, Tamnar, Raigarh: Commissioning Department
9 pages
Supporting Children's Writing in Reception Class
No ratings yet
Supporting Children's Writing in Reception Class
26 pages
Drinking Water Quality Testing Workshop
No ratings yet
Drinking Water Quality Testing Workshop
2 pages
Group V
No ratings yet
Group V
24 pages
B.Tech Web Dev Training Report
No ratings yet
B.Tech Web Dev Training Report
15 pages
UNIX Lab Assignment 5
No ratings yet
UNIX Lab Assignment 5
1 page
WP 118
No ratings yet
WP 118
72 pages
Community College: Daraga
No ratings yet
Community College: Daraga
7 pages
Grammar and Vocabulary Exercises
No ratings yet
Grammar and Vocabulary Exercises
1 page
Appendix 75 - RLSDDP
No ratings yet
Appendix 75 - RLSDDP
1 page
A Critical Study of The Novels of John Fowles: University of New Hampshire Scholars' Repository
No ratings yet
A Critical Study of The Novels of John Fowles: University of New Hampshire Scholars' Repository
239 pages
Business Registration Process Guide
No ratings yet
Business Registration Process Guide
2 pages

Bentoml

Uploaded by

Bentoml

Uploaded by

BentoML

Inference Optimization Engineer

About this role

Why join us:

Remote work policy

Team size Founded

About the team

2 Virtual Onsite (4 Hours)

You might also like