Professional Documents
Culture Documents
Full Chapter Google Cloud Certified Professional Machine Learning Engineer Study Guide 1St Edition Mona PDF
Full Chapter Google Cloud Certified Professional Machine Learning Engineer Study Guide 1St Edition Mona PDF
https://textbookfull.com/product/official-google-cloud-certified-
professional-cloud-architect-study-guide-dan-sullivan/
https://textbookfull.com/product/google-cloud-certified-
professional-cloud-architect-all-in-one-exam-guide-1st-edition-
iman-ghanizada/
https://textbookfull.com/product/ccsp-isc-2-certified-cloud-
security-professional-official-study-guide-1st-edition-ohara/
https://textbookfull.com/product/ccsp-certified-cloud-security-
professional-official-study-guide-2-nd-edition-ben-malisow/
CWSP Certified Wireless Security Professional Official
Study Guide Coll.
https://textbookfull.com/product/cwsp-certified-wireless-
security-professional-official-study-guide-coll/
https://textbookfull.com/product/ccsp-certified-cloud-security-
professional-all-in-one-exam-guide-daniel-carter/
https://textbookfull.com/product/aws-certified-cloud-
practitioner-study-guide-clf-c01-exam-ben-piper/
https://textbookfull.com/product/pivotal-certified-professional-
spring-developer-exam-a-study-guide-1st-edition-iuliana-cosmina/
https://textbookfull.com/product/isc%c2%b2-cissp-certified-
information-systems-security-professional-official-study-guide-
eighth-edition-chapple/
Official Google Cloud Certified
Professional Machine Learning
Engineer
Study Guide
Mona Mona
Pratap Ramamurthy
Copyright © 2024 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada and the United Kingdom.
ISBNs: 9781119944461 (paperback), 9781119981848 (ePDF), 9781119981565 (ePub)
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by
any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under
Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the
Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc.,
222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www
.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department,
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at
www.wiley.com/go/permission.
Trademarks: WILEY and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/
or its affiliates, in the United States and other countries, and may not be used without written permission. Google
Cloud is a trademark of Google, Inc. All other trademarks are the property of their respective owners. John Wiley &
Sons, Inc. is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and authors have used their best efforts in preparing
this book, they make no representations or warranties with respect to the accuracy or completeness of the contents
of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose.
No warranty may be created or extended by sales representatives or written sales materials. The advice and
strategies contained herein may not be suitable for your situation. You should consult with a professional where
appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared
between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any
loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or
other damages.
For general information on our other products and services or for technical support, please contact our Customer
Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax
(317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be
available in electronic formats. For more information about Wiley products, visit our website at www.wiley.com.
Library of Congress Control Number: 2023931675
Cover image: © Getty Images Inc./Jeremy Woodhouse
Cover design: Wiley
To my late father, grandparents, mom, and husband (Pratyush Ranjan), mentor
(Mark Smith), and friends. Also to anyone trying to study for this exam. Hope
this book helps you pass the exam with flying colors!
—Mona Mona
This book is the product of hard work by many people, and it was wonderful to see
everyone come together as a team, starting with Jim Minatel and Melissa Burlock from
Wiley and including Kim Wimpsett, Christine O’ Connor, Saravanan Dakshinamurthy, Judy
Flynn, Arielle Guy, and the reviewers.
Most importantly, I would like to thank Mona for spearheading this huge effort. Her
knowledge from her previous writing experience and leadership from start to finish was
crucial to bringing this book to completion.
—Pratap Ramamurthy
About the Authors
Mona Mona is an AI/ML specialist at Google Public Sector. She is the author of the book
Natural Language Processing with AWS AI Services and a speaker. She was a senior AI/ML
specialist Solution Architect at AWS before joining Google. She has 14 certifications and has
created courses for AWS AI/ML Certification Specialty Exam readiness. She has authored
17 blogs on AI/ML and also co-authored a research paper on AWS CORD-19 Search: A
neural search engine for COVID-19 literature, which won an award at the Association
for the Advancement of Artificial Intelligence (AAAI) conference. She can be reached at
monasheetal3@gmail.com.
Pratap Ramamurthy loves to solve problems using machine learning. Currently he is an AI/
ML specialist at Google Public Sector. Previously he worked at AWS as a partner solution
architect where he helped build the partner ecosystem for Amazon SageMaker. Later he
was a principal solution architect at H2O.ai, a company that works on machine learning
algorithms for structured data and natural language. Prior to that he was a developer and a
researcher. To his credit he has several research papers in networking, server profiling tech-
nology, genetic algorithms, and optoelectronics. He holds three patents related to cloud
technologies. In his spare time, he likes to teach AI using modern board games. He can be
reached at pratap.ram@gmail.com.
About the Technical Editors
Hitesh Hinduja is an ardent artificial intelligence (AI) and data platforms enthusiast
currently working as a senior manager in Azure Data and AI at Microsoft. He worked as
a senior manager in AI at Ola Electric, where he led a team of 30+ people in the areas of
machine learning, statistics, computer vision, deep learning, natural language processing,
and reinforcement learning. He has filed 14+ patents in India and the United States and has
numerous research publications under his name. Hitesh has been associated in research roles
at India’s top B-schools: Indian School of Business, Hyderabad, and the Indian Institute of
Management, Ahmedabad. He is also actively involved in training and mentoring and has
been invited as a guest speaker by various corporations and associations across the globe.
He is an avid learner and enjoys reading books in his free time.
Kanchana Patlolla is an AI innovation program leader at Google Cloud. Previously she
worked as an AI/ML specialist in Google Cloud Platform. She has architected solutions with
major public cloud providers in financial services industries on their quest to the cloud,
particularly in their Big Data and machine learning journey. In her spare time, she loves to
try different cuisines and relax with her kids.
Index 315
Contents
Introduction xxi
Imbalanced Data 29
Data Splitting 31
Data Splitting Strategy for Online Systems 31
Handling Missing Data 32
Data Leakage 33
Summary 34
Exam Essentials 34
Review Questions 36
Chapter 3 Feature Engineering 39
Consistent Data Preprocessing 40
Encoding Structured Data Types 41
Mapping Numeric Values 42
Mapping Categorical Values 42
Feature Selection 44
Class Imbalance 44
Classification Threshold with Precision and Recall 45
Area under the Curve (AUC) 46
Feature Crosses 46
TensorFlow Transform 49
TensorFlow Data API (tf.data) 49
TensorFlow Transform 49
GCP Data and ETL Tools 51
Summary 51
Exam Essentials 52
Review Questions 53
Chapter 4 Choosing the Right ML Infrastructure 57
Pretrained vs. AutoML vs. Custom Models 58
Pretrained Models 60
Vision AI 61
Video AI 62
Natural Language AI 62
Translation AI 63
Speech-to-Text 63
Text-to-Speech 64
AutoML 64
AutoML for Tables or Structured Data 64
AutoML for Images and Video 66
AutoML for Text 67
Recommendations AI/Retail AI 68
Document AI 69
Dialogflow and Contact Center AI 69
Custom Training 70
Contents xv
Summary 196
Exam Essentials 196
Review Questions 197
Chapter 10 Scaling Models in Production 199
Scaling Prediction Service 200
TensorFlow Serving 201
Serving (Online, Batch, and Caching) 203
Real-Time Static and Dynamic Reference Features 203
Pre-computing and Caching Prediction 206
Google Cloud Serving Options 207
Online Predictions 207
Batch Predictions 212
Hosting Third-Party Pipelines (MLFlow) on Google Cloud 213
Testing for Target Performance 214
Configuring Triggers and Pipeline Schedules 215
Summary 216
Exam Essentials 217
Review Questions 218
Chapter 11 Designing ML Training Pipelines 221
Orchestration Frameworks 223
Kubeflow Pipelines 224
Vertex AI Pipelines 225
Apache Airflow 228
Cloud Composer 229
Comparison of Tools 229
Identification of Components, Parameters, Triggers,
and Compute Needs 230
Schedule the Workflows with Kubeflow Pipelines 230
Schedule Vertex AI Pipelines 232
System Design with Kubeflow/TFX 232
System Design with Kubeflow DSL 232
System Design with TFX 234
Hybrid or Multicloud Strategies 235
Summary 236
Exam Essentials 237
Review Questions 238
Chapter 12 Model Monitoring, Tracking, and Auditing
Metadata 241
Model Monitoring 242
Concept Drift 242
Data Drift 243
Model Monitoring on Vertex AI 243
Contents xix
for the second year in a row, paying an average salary of $175,761/year. So, there is
a demand from many engineers to get certified. Of the many certifications that GCP
offers, the AI/ML certified engineer is a new certification and is still evolving.
Provides an opportunity for advancement IDC’s research (www.idc.com/getdoc
.jsp?containerId=IDC_P40729) indicates that while AI/ML adoption is on the rise,
the cost, lack of expertise, and lack of life cycle management tools are among the top
three inhibitors to realizing AI and ML at scale.
This book is the first in the market to talk about Google Cloud AI/ML tools and the
technology covering the latest Professional ML Engineer certification guidelines released
on February 22, 2022.
Recognizes Google as a leader in open source and AI Google is the main contrib-
utor to many of the path-breaking open source softwares that dramatically changed
the landscape of AI/ML, including TensorFlow, Kubeflow, Word2vec, BERT, and T5.
Although these algorithms are in the open source domain, Google has the distinct ability
of bringing these open source projects to the market through the Google Cloud Platform
(GCP). In this regard, the other cloud providers are frequently seen as trailing Google’s
offering.
Raises customer confidence As the IT community, users, small business owners, and
the like become more familiar with the PMLE certified professional, more of them will
realize that the PMLE professional is more qualified to architect secure, cost-effective,
and scalable ML solutions on the Google Cloud environment than a noncertified
individual.
Chapter 1: Framing ML Problems This chapter covers how you can translate business
challenges into ML use cases.
Chapter 2: Exploring Data and Building Data Pipelines This chapter covers visualiza-
tion, statistical fundamentals at scale, evaluation of data quality and feasibility, estab-
lishing data constraints (e.g., TFDV), organizing and optimizing training datasets, data
validation, handling missing data, handling outliers, and data leakage.
Chapter 3: Feature Engineering This chapter covers topics such as encoding structured
data types, feature selection, class imbalance, feature crosses, and transformations (Ten-
sorFlow Transform).
Chapter 4: Choosing the Right ML Infrastructure This chapter covers topics such as
evaluation of compute and accelerator options (e.g., CPU, GPU, TPU, edge devices)
xxiv Introduction
and choosing appropriate Google Cloud hardware components. It also covers choos-
ing the best solution (ML vs. non-ML, custom vs. pre-packaged [e.g., AutoML, Vision
API]) based on the business requirements. It talks about how defining the model output
should be used to solve the business problem. It also covers deciding how incorrect
results should be handled and identifying data sources (available vs. ideal). It talks about
AI solutions such as CCAI, DocAI, and Recommendations AI.
Chapter 5: Architecting ML Solutions This chapter explains how to design reliable,
scalable, and highly available ML solutions. Other topics include how you can choose
appropriate ML services for a use case (e.g., Cloud Build, Kubeflow), component types
(e.g., data collection, data management), automation, orchestration, and serving in
machine learning.
Chapter 6: Building Secure ML Pipelines This chapter describes how to build secure
ML systems (e.g., protecting against unintentional exploitation of data/model, hacking).
It also covers the privacy implications of data usage and/or collection (e.g., handling
sensitive data such as personally identifiable information [PII] and protected health
information [PHI]).
Chapter 7: Model Building This chapter describes the choice of framework and model
parallelism. It also covers modeling techniques given interpretability requirements,
transfer learning, data augmentation, semi-supervised learning, model generalization,
and strategies to handle overfitting and underfitting.
Chapter 8: Model Training and Hyperparameter Tuning This chapter focuses on the
ingestion of various file types into training (e.g., CSV, JSON, IMG, parquet or databases,
Hadoop/Spark). It covers training a model as a job in different environments. It also
talks about unit tests for model training and serving and hyperparameter tuning. More-
over, it discusses ways to track metrics during training and retraining/redeployment
evaluation.
Chapter 9: Model Explainability on Vertex AI This chapter covers approaches to
model explainability on Vertex AI.
Chapter 10: Scaling Models in Production This chapter covers scaling prediction ser-
vice (e.g., Vertex AI Prediction, containerized serving), serving (online, batch, caching),
Google Cloud serving options, testing for target performance, and configuring trigger
and pipeline schedules.
Chapter 11: Designing ML Training Pipelines This chapter covers identification of
components, parameters, triggers, and compute needs (e.g., Cloud Build, Cloud Run). It
also talks about orchestration framework (e.g., Kubeflow Pipelines/Vertex AI Pipelines,
Cloud Composer/Apache Airflow), hybrid or multicloud strategies, and system design
with TFX components/Kubeflow DSL.
Chapter 12: Model Monitoring, Tracking, and Auditing Metadata This chapter covers
the performance and business quality of ML model predictions, logging strategies,
Introduction xxv
organizing and tracking experiments, and pipeline runs. It also talks about dataset ver-
sioning and model/dataset lineage.
Chapter 13: Maintaining ML Solutions This chapter covers establishing continuous
evaluation metrics (e.g., evaluation of drift or bias), understanding the Google Cloud
permission model, and identification of appropriate retraining policies. It also covers
common training and serving errors (TensorFlow), ML model failure, and resulting
biases. Finally, it talks about how you can tune the performance of ML solutions for
training and serving in production.
Chapter 14: BigQuery ML This chapter covers BigQueryML algorithms, when to use
BigQueryML versus Vertex AI, and the interoperability with Vertex AI.
Chapter Features
Each chapter begins with a list of the objectives that are covered in the chapter. The book
doesn’t cover the objectives in order. Thus, you shouldn’t be alarmed at some of the odd
ordering of the objectives within the book.
At the end of each chapter, you’ll find several elements you can use to prepare for
the exam.
Exam Essentials This section summarizes important information that was covered in
the chapter. You should be able to perform each of the tasks or convey the information
requested.
Review Questions Each chapter concludes with 8+ review questions. You should
answer these questions and check your answers against the ones provided after the ques-
tions. If you can’t answer at least 80 percent of these questions correctly, go back and
review the chapter, or at least those sections that seem to be giving you difficulty.
To get the most out of this book, you should read each chapter from start to finish and
then check your memory and understanding with the chapter-end elements. Even if you’re
already familiar with a topic, you should skim the chapter; machine learning is complex
enough that there are often multiple ways to accomplish a task, so you may learn something
even if you’re already competent in an area.
xxvi Introduction
Like all exams, the Google Cloud certification from Google is updated
periodically and may eventually be retired or replaced. At some point
after Google is no longer offering this exam, the old editions of our books
and online tools will be retired. If you have purchased this book after
the exam was retired, or are attempting to register in the Sybex online
learning environment after the exam was retired, please know that we
make no guarantees that this exam’s online Sybex tools will be available
once the exam is no longer available.
Practice tests All of the questions in this book appear in our proprietary digital test
engine—including the 30-question assessment test at the end of this introduction and
the 100+ questions that make up the review question sections at the end of each chapter.
In addition, there are two 50-question bonus exams.
Electronic “flash cards” The digital companion files include 50+ questions in flash card
format (a question followed by a single correct answer). You can use these to review
your knowledge of the exam objectives.
Glossary The key terms from this book, and their definitions, are available as a fully
searchable PDF.
A tip provides information that can save you time or frustration and that
may not be entirely obvious. A tip might describe how to get around a
limitation or how to use a feature to perform an unusual task.
Language: French
CAILLOU ET TILI
PARIS
CALMANN-LÉVY, ÉDITEURS
3, RUE AUBER, 3
CALMANN-LÉVY, ÉDITEURS
DU MÊME AUTEUR
Format in-18.
J’ai alors regardé les petites filles et j’ai été obligé de constater
qu’il avait raison : quand elles se trouvent, en nombre, devant un
seul petit garçon, ce sont de petites rosses. Mais Caillou ne
s’aperçoit pas qu’elles ne font peut-être que se venger, car il est, lui
Caillou, complètement idiot quand il se trouve seul avec une seule
d’entre elles, ou deux tout au plus. Elles lui font la cour, et il ne s’en
aperçoit pas. Il est poli, mais il s’embête.
Ça doit tenir à deux choses. La première, c’est qu’elles sont
beaucoup plus intelligentes que lui pour leur âge, et moins actives.
Caillou est pour les jeux où l’on remue. Il a besoin d’épancher une
surabondance de force, et s’il parle en jouant c’est pour raconter des
choses absurdes et démesurées. N’oubliez pas que c’est lui qui
voulait m’écraser avec une charrette de vingt-neuf sous. Il est
instinctivement énorme, c’est-à-dire romantique, et la réalité l’ennuie.
Les petites filles ont au contraire le sens des charmes de cette
réalité, elles la voient d’une façon beaucoup plus aiguë et précise.
La seconde différence entre elles et Caillou, c’est qu’elles ont
l’instinct inné de la coquetterie et qu’il en est dépourvu. Caillou existe
pour les petites filles, tandis que les petites filles n’existent point pour
Caillou : ce point de dissidence est grave. Et plus elles sont petites,
plus il les méprise. Il n’aime que ce qui est grand.
… On vient de le conduire chez Jeanne, qui reçoit aussi Vivette.
Ils vont être trois, dans une nursery pour passer deux heures.
Caillou ne discute jamais la décision de ses parents ou de sa bonne,
quand on le mène dans un endroit qu’il ne connaît pas ; il n’a aucune
opinion préconçue. De plus on lui a dit : « Tu seras gentil, n’est-ce
pas ? » Il n’aime pas beaucoup ces avertissements, mais ils lui font
de l’effet. Toute parole agit sur lui, elle émeut sa volonté imaginative
et malléable. Vivette et Jeanne sont d’ailleurs très aimables avec lui.
Elles ne sont que deux. Ce n’est pas aujourd’hui « l’instinct ennemi »
du sexe contre un autre sexe qui parlera, c’est celui de la
coquetterie. Chacune voudrait être celle qui est remarquée, et
d’ailleurs on les a faites très belles. Seulement Vivette, qui est en
visite, a une capote blanche sur la tête, tandis que Jeanne, qui
reçoit, n’a qu’un ruban bleu. Et cela n’est pas sans l’inquiéter. Un
instinct primitif et sauvage porte en effet les enfants à mettre la
beauté, non pas dans les traits, mais dans ce qu’on y ajoute. Pour
une petite fille, une belle petite fille est celle qui a une belle robe.
Pour un Caillou, au contraire, le petit garçon enviable, n’eût-il pas de
jambes, sera celui qui a un aéroplane.
… Mais Caillou, une fois qu’il est dans la nursery, ne fait pas plus
attention à Jeanne qu’à Vivette. Il sent qu’elles n’ont pas de
mauvaises intentions à son égard, ce qui lui suffit ; il ne se soucie
pas du tout de savoir qu’elles veulent lui plaire. Il les traite donc de la
même manière. Ceci ne veut pas dire qu’il leur accorde
impartialement ses faveurs ; il reste lui-même, tout simplement. Il
s’amuse pour son compte et les deux petites filles le suivent, en
essayant de se faire remarquer. Parfois l’une met sa joue sur la joue
de Caillou, et Caillou l’embrasse. Alors l’autre fait de même, et
Caillou l’embrasse également, sans y trouver beaucoup de plaisir.
Mais il ne s’ennuie pas, il est à son aise.
Cependant on vient le chercher, pour dire bonjour à la maman de
Vivette. Il y va ingénument, sans grands regrets ni satisfaction
évidente. Je ne sais pas ce qu’on lui dit, je ne sais pas ce qu’il
répond, et ceci n’importe pas à l’histoire. Mais tout à coup on entend,
dans la nursery, des pleurs et des cris qui font retentir les murailles,
et les mères se précipitent.
Un sentiment obscur et puissant, quelque chose comme un
désespoir passionné, venait de s’emparer de Vivette et de Jeanne,
laissées à elles-mêmes. Ni l’une ni l’autre n’a réussi à vraiment
attirer l’attention de Caillou, et durant toute une heure leur amertume
en a grandi ; elles s’en rendent, sans même s’en douter,
réciproquement responsables. Voilà pourquoi, l’objet de leur rivalité
ayant disparu, la querelle a éclaté, sans qu’elles sachent pourquoi.