You are on page 1of 16

Google Cloud Professional DevOps Engineer Exam

Prep Notes by
Ammett

0
BY AMMETT
Google Cloud Professional Cloud DevOps Engineer Exam
Exam prep sheet by Ammett v.2
06-2023

SRE
SRE SLO SLI SLA Error budget Toil Review documents
SRE Book

Video
What it is What it is What it is What it is What it is What it is SRE playlist watch all
In general, an SRE team is This is a target value or range of This is a carefully defined This is an explicit or implicit Provides a clear, objective Toil is the kind of work tied to
responsible for the availability, values for a service level that is quantitative measure of some contract with your users that metric that determines how running a production service that
latency, performance, efficiency, measured by an SLI. aspect of the level of service includes consequences of unreliable the service is tends to be manual, repetitive,
change management, monitoring, that is provided. meeting (or missing) the allowed to be within a single automatable, tactical, devoid of
emergency response, and SLOs they contain quarter. enduring value, and that scales
capacity planning of their linearly as a service grows
service(s)
What you should know What you should know What you should know What you should know What you should know What you should know My experience
1- What it is and how it aligns with 1- Actions to take when SLO’s 1- How to set metrics 1- These have penalties 1- How is this determined 1- What is toil Various element of the SRE topics
DevOps are being met or not being met 2- Freshness 2- Should be less strict than 2- What happen when this is 2- How to handle toil over time combine to make some
2- Monitor SLO burn rate 3- Formulas SLO’s exceeded or in danger 3- What type of task are worth interesting questions. Spend
(select_slo_burn_rate) automating some time on each area and learn
Key Points Key Points Key Points Key Points Key Points Key Points to appreciate your SLI metrics.
Generally, a good area to pick up
1- Understand the mind-set of the 1- Options, adjusts SLO & SLI, 1- Understand the “math” what 1- Compare SLA to SLO 1- How are these established 1- What should be the aim of
some points and not too hard if
SRE principles (important) stop deployment until stable, is being measured targets point and who is responsible. engineering task vs toil.
you understand them well.
Automate this year’s toil away

Post-mortems Alerting Monitoring Managing Risk DevOps Handling Incidents Review documents
SRE Workbook
Post-mortems
DevOps

What it is What it is What it is What it is What it is What it is


These are conducted after and While there may be many alerts Collecting, processing, Item or risk that may cause Organizational and cultural Things break so it is important to
incident and a great form of ultimately, your goal is to be aggregating, and displaying you to not meet the SLO movement that aims to understand how and what to do Video
learning for everyone. notified for a significant event: an real-time quantitative data increase software delivery when that happens. Improving reliability
event that consumes a large about a system, such as query velocity, service reliability, and
fraction of the error budget. counts and types, error counts shared ownership among Burn rate
etc. stakeholders.
What you should know What you should know What you should know What you should know What you should know What you should know My experience
1- Writing post-mortems based on 1- Precision, Recall, Detection 1- Analyse long term trends. 1- Target risk that will bring 1- Map SRE principles to 1- What options do you employ These topics make up the core of
SRE principles. time, reset time 2- Comparing over time you in the error budget DevOps the SRE practice. Combined they
2. Quantify data will be featured and you can pick
Key Points Key Points Key Points Key Points Key Points Key Points up a few points if you are
1- No blame, root causes, action 1-. Target Error rate, Increased Concepts in service 1- Controlling and identify 1- No Silos, Accidents are 1- Roll back, Connection draining, prepared enough.
items alert window, incrementing monitoring risk helps you manage your normal, Gradual change, stop testing, A/B, scaling
duration, Burn rate, multiple burn SLO Tooling, measurement is
rate, multiwindow, multi-burn- crucial.
rate alerts

1
BY AMMETT
SRE
Response Structure API lifecycle Communication API Error Codes Review documents
Data incident response
API lifecycle
Error codes
What it is What it is What it is What it is
Communication and structure is a What is the process for your new Keeping stack holder in the These codes give insight into
key part of handling incident. deployment and life cycle of loops. Communication is performance of API Video
your API. “KEY” the better it is the better SRE playlist
for your incident management
What you should know What you should know What you should know What you should know
1- Who handle what role 1 - Stages to replace an API 1- Who handles 1- 400, 500, 200 codes
2- Delegation communication in what
3- Communication circumstances
2-Internal and external
communication My experience
Key Points Key Points Key Points Key Point Handling incident is important.
1- Operation lead, Communication 1 - The order of the process 1- Communication loops Error codes There are steps, roles, activities
lead, Incident commander 2 - Is it chicken or egg involved. Do not forget
communication also.

Incident response team Incident response workflow

2
BY AMMETT
Cloud Operations
Cloud What it is Key points What you should know Review documents Video My experience
Monitoring Cloud Monitoring discovers and 1- Metrics 1- Everything in depth about Cloud Monitoring docs Best practice for monitoring Ok if you don’t know Cloud
monitors your cloud resources 2- Custom metrics Operations Operations spend some time on
automatically, whether you are 3- Alerting policies it.
running on Google Cloud, AWS 4- Monitoring This means you should do some
or other. testing an experimenting with all
the features.
Cloud What it is Key points What you should know Review documents Video My experience
logging Cloud Logging automatically 1- How it works 1- As much as possible ☺ Log router Best practices for cloud Ok if you don’t know Cloud
collects logs from Google Cloud 2- Exporting logging Operations spend some time on
resources. You can also collect Export to Splunk it.
logs from your applications, on- Understand your services
prem resources, and resources with Cloud Logging
from other cloud providers
Sharing charts What it is Key points What you should know Review documents My experience
If you want, you can share a 1- Sharing various chars is possible 1- How to create charts Display SLO This can pick you up a point
chart with others. 2- Understand how to customise the 2- How to share charts maybe.
parameter Sharing custom
3- Know the tag used dashboard

Metrics scopes What it is Key points What you should know Review documents Video My experience
By default, a Google Cloud 1- How to config 1- Required roles, project owner, Metric Scopes How to use metric scopes Understand the steps to configure
project has visibility only to the 2- How to design monitoring editor, monitoring Admin,
metrics it stores. However, you 3- Every Workspace has a host project etc. Multiple projects
can expand the set of metrics 4- Add existing account to workspace
that a project can access by Roles
adding other Google Cloud
projects to the project's metrics
scope.
Python What it is Key points What you should know Review documents My experience
You can write logs to Logging 1- How to use with App engine, GKE, 1- Logging library for python Cloud logging for python What about the others languages?
from Python applications by compute engine, locally
using the Python logging 2- IAM permission required Google Cloud Client
handler included with the Libraries for Python
Logging client library
Ops Agent What it is Key points What you should know Review documents My experience
The used to collect telemetry 1- Stream log from VM and 3rd party 1 - Uses Fluent Bit About the agent Cloud Operations is important.
from virtual machine (VM) software packages 2 - Get syslog files Spend time on the components
instances or third party 2- How to Install agent 3 - Get third party logs Configuring the agent
applications.

3
BY AMMETT
Cloud Operations – Trace – Profiler
Trace What it is Key points What you should know Review documents Video My experience
Trace is a distributed tracing 1- What type of problems you would use 1- Latency Trace Distributed tracing with Think latency and finding it’s
system that collects latency trace for. 2- Permission errors Open telemetry & Cloud cause.
data from your applications and 3- How & when to create custom roles Force request to be Trace
displays it in the Google Cloud 4- Service account permissions traced
Platform Console. 5- X-Cloud-Trace-Context

Profiler What it is Key points What you should know Review documents Video My experience
Profiler continuously analyzes 1- Capture characteristics of the code as 1- Show what happing within each Profiler Stackdriver profiler Know what your code is doing in
the performance of CPU or it runs service real time, get analytics with
memory-intensive functions 2- Finds bugs 2- Take random sample profiles profiler.
executed across an application. 3- It does not require pervasive changes

Alerting What it is Key points What you should know Review documents Video My experience
You must configure most 1- Different channels and how to use 1- Email, mobile apps, pagerduty, SMS, Notification Options Error reporting Alerts can be sent using multiple
notification channels before you them for alerts Slack, Webhooks Error reporting channels. Understand the
use them in alerting policies. Alerting integrations.

Cloud What it is Key points Review documents Video My experience


Operations for Designed to monitor GKE 1- Features and use Cloud Operations for Manage GKE service with Good to know
clusters GKE Cloud Operation
GKE

Cloud IAM What it is Key points What you should know Review documents My experience
With the logging data in a 1- What are the various roles and the 1- Permissions level necessary to Role etc Export logs IAM permissions are necessary
Google Cloud project, you must permissions they have to do various export logs Permission and roles to run most services. You may as
be a member and have an Cloud functions 2- (Logging.configWriter, well get familiar with them.
IAM role that grants you 2 – Data Access audit logs logging.admin
permission to use Logging owner, logging.privateLogViewer)

4
BY AMMETT
Routing and storage overview

5
BY AMMETT
Data
BigQuery Looker Studio Cloud Storage Pub/Sub API gateway Recommender Review documents
Pub/Sub
BigQuery
Continuous Data integration in BQ
Cloud storage
What it is What it is What it is What it is What it is What it is
API Gateway
BigQuery is a serverless, highly allows you to create branded Used for a range of scenarios Cloud Pub/Sub is API Gateway is an API A recommender is a service on
scalable, and cost-effective cloud reports including serving website a publish/subscribe (Pub/Sub) management system that Google Cloud that provides usage
enterprise data warehouse that with data visualizations to content, storing data for service: a messaging service provides management, recommendations for Google
enables super-fast SQL queries share with your clients. archival and disaster recovery, where the senders of monitoring, and authentication Cloud resources. Video
using the processing power of or distributing large data messages are decoupled from for your APIs. BigQuery Spotlight
Google's infrastructure. objects to users via direct the receivers of messages API gateway
download. Recommender Playlist
What you should know What you should know What you should know What you should know What you should know What you should know My experience
1- How it works with Cloud 1- Integrating Google 1- What it does, classes 1- Multiple uses and 1- How does this work 1- What it is used for
operations etc. services these services 2- Integrations integration of Pub/Sub 2- Syntax of recommenders All these may appear as single or
3- Uses for DevOps (TF state, combinations in the exam so get
storage, etc) familiar.
Key Points Key Points Key Points
1- Sinks, viewing logs, exporting 1- Be aware of the services API Gateway recommender
logs, ingesting logs that can use it as a trigger HTTP(S) LB for API gateway
Recommender list
Networking / Compute
Computer Engine Managed Instance Flow logs Network service Tier Spot/Preemptible VM’s Committed use CUD Review documents
Groups Committed use
Sustained use
Managed instances
Preemptible VM
Spot VMs
What it is What it is What it is What it is What it is What it is
Network Service Tier
Compute Engine delivers A managed instance group VPC Flow Logs record a Allows customers to optimize Best for short-lived compute Committed use discounts are
configurable virtual machines (MIG) contains identical sample of network flows sent their cloud network for instances suitable for batch ideal for workloads with Flow logs
running in Google’s data centres instances that are based on from and received by VM performance or price jobs and fault-tolerant predictable resources needs. Video
with access to high-performance an instance template. instances, including instances optimisation. workloads.
networking infrastructure and used as GKE nodes Highly available deployments
block storage.
What you should know What you should know What you should know What you should know What you should know What you should know My experience
1- Monitor these with Ops Agent 1- What it does (autoheal, 1- Log entry sampling (default 1- When to use. Premium and 1- How, when to use these to 1- Predictable work needs The networking once again is a
2- Monitor application on VM load balancing, autoscaling 0.50 (50%)) max is 1 standard save cost or help processing 2- Term 1-3 years key point in any cloud
an auto-updating. 2- TCP/UDP traffic 2- What is the difference and 3- Billed weather used or not infrastructure same applies in
3- Health checks trade-off monthly DevOps. Get familiar with these
4- review SUDs also
and pick up a point or 3
Key Points Key Points Key Points Key Points
1- Keep scenarios in mind 1-What you can monitor with it 1- Managing cost (know the 1- Requirements and
where you would use these 2- Used for seeing what’s trade-offs also) recommendation for use.
for deployments happening in the network

6
BY AMMETT
API Gateway Architecture

7
BY AMMETT
Security
Service What it is Key points What you should know Review documents Video My experience
accounts IAM lets you manage who has 1- All the services need some level of 1- Modify permission on service IAM roles IAM is a core concept in Google
access to what in you GCP permissions to run. accounts. Cloud.
environment. 2- User service account or not 2- Organizational Constraints Best practices for identity
constraints/iam.disableServiceAccount
KeyCreation

KMS What it is Key points What you should know Review documents Video My experience
Cloud KMS is a cloud-hosted key 1- You can generate, use, rotate, and 1- Using cloud KMS with other Google Using cloud KMS with Securing Kubernetes secrets KMS helps you in many ways.
management service that lets destroy cryptographic keys Cloud services (especially developer other products Figure out which ways you need
you manage encryption for your based) to be helped.
cloud services the same way
you do on-premises. .

Secret Manager What it is Key points What you should know Review documents Video My experience
Secret Manager provides a secure 1- Encrypt, store and audit (infrastructure 1-Applications often require access to Secrets Manager Cloud Run secrets securely Secrets will pop up somewhere,
and convenient tool for storing and apps secrets) small pieces of sensitive data at build with secret manager so now it’s no longer a secret.
API keys, passwords, certificates, 2- You can address individual version of a or run time. These pieces of data are Secrets
and other sensitive data. secret often referred to as secrets. Cloud Code
3- Rotation
Cloud SCC What it is Key points What you should know Review documents Video My experience
Security Command Center gives 1- What it does. 1- What may be relevant for your Security Command Cloud security cc Good to know
enterprises consolidated visibility pipeline. Center
into their Google Cloud assets
across their organization.

Binary What it is Key points What you should know Review documents Video My experience
Authorisation Binary Authorization is a service 1- Allows or blocks deployment of 1-Know the flow of binary authorisation Secure software chains Binary Authorisation Demo This is a bit confusing so study
on Google Cloud Platform (GCP) images to GKE based on policy the flow and the stages
that provides software supply- 2- Attestation (Très important) Codelab Binary
chain security for applications 3- Enforcement functionality authorization
that run in the Cloud. 4 Authorization

Images What it is Key points What you should know Review documents Video My experience
Artifact Analysis provides 1- Allow vulnerability scanning and 1- Performs scans on images Get image vulnerabilities Building small containers Think secured images and secure
vulnerability information and metadata storage for software artifacts -Incremental scans End-To-End Security and deployments
other types of metadata for the -Continuous analysis Compliance for Your
container images in Artifact and Kubernetes Software Supply
Container Registry. Chain

8
BY AMMETT
Binary Authorization/vulnerability Binary Authorization/Cloud Build

9
BY AMMETT
DevOps and tools
Git Code process Jenkins Spinnaker Terraform Webhooks Review documents
Jenkins
Webhooks
Terraform
What it is What it is What it is What it is What it is What it is Video
Git is a version control app which Stages involved in DevOps. Jenkins is an open source Spinnaker on GCP is a tool for Terraform is a tool for building, A Webhook target is an open and CI/CD across multiple
handle everything from small to automation server that easily installing a production- changing, and versioning public URL. Most services provide Terraform
large projects with speed an enables developers around the ready instance of Spinnaker, infrastructure safely and a token or a secret to ensure that Spinnaker
efficiency. world to reliably build, test, and and for managing that efficiently developed by the incoming requests are from My experience
deploy their software. instance over time. HashiCorp authorized services These tools and terms have a
What you should know What you should know What you should know What you should know What you should know What you should know high probability of being used and
1- What it is and how it aligns with 1- Put code into SCM, Build, 1- What it does, flow, stages, 1- Stages it does, flow, stages, 1- Using terraform to manage 1- What is it
uses with GCP
the flows are very important. Get
DevOps Test, Stage, deploy, test uses with GCP code in a team (Central 2- Where and what services it’s
code, Run repository, peer review, one used with some lab practice, try out
final source) 3- How to set up Terraform to clarify anything in
Key Points Key Points Key Points Key Points your mind.
1- Know the steps 1- Triggers for Spinnaker Terraform best practices
Jenkins (Important)
Store state
Spinnaker cloud build pub/sub

CI CD Triggers Artifacts JSON YAML Review documents

Triggers

Artifacts
What it is What it is What it is What it is What it is What it is
CI follows the principle of frequent Extension of Continuous A trigger automatically starts a If your build produces artifacts JSON stands for JavaScript YAML is a human-readable data-
automatic integration of code. It is Integration build whenever you make any such as container images, Object Notation. JSON is a serialization language.
easier to make small frequent changes to your source code. binaries, or tarballs, you can lightweight format for storing
changes. store them in Container and transporting data
Registry, Cloud Storage, or any
private third-party repositories.
What you should know What you should know What you should know What you should know What you should know What you should know My experience
1- Helps find bugs faster 1- Ensures that the code is 1- Types of triggers, how to 1- Where / How to store 1- General Syntax 1- General Syntax Understanding what these terms
2- Automatically test all new and always ready to be deployed create triggers. (important) (Container registry or Cloud are will help you understand the
modified code with the master 2- Manual approval is Storage) point of view of the question.
code. required to actually deploy These are core terms in the
the software to production
DevOps process
Key Points Key Points
1- Continuous Integration 1- Continuous delivery

10
BY AMMETT
CD diagram CI diagram

App delivery pipeline Spinnaker Terraform, Cloud Build and Gitops

11
BY AMMETT
Kubernetes - DevOps on Google
GKE Deployments ReplicaSets Load balancer Ingress Workload identity Review documents
GKE
Canary
Deployments
ReplicaSets
What it is What it is What it is What it is What it is What it is
Load balancer
GKE provides a managed A Deployment runs multiple ReplicaSet’s purpose is to This can be exposed as a An Ingress object defines rules allows workloads in your GKE
environment for deploying, replicas of your application maintain a stable set of replica service type LoadBalancer in for routing external HTTP(S) clusters to impersonate IAM
Video
managing, and scaling your and automatically replaces Pods running at any given time kubernetes to create a traffic to applications running service accounts to access
Canary GKE
containerized applications using any instances that fail or Network Load Balancer to in a cluster Google Cloud services.
Google infrastructure. become unresponsive. distributes traffic among Artifact registry to GKE
GKE Essentials
virtual machine (VM)
What you should know My experience
What you should know What you should know What you should know What you should know
1- This is standard understand 1- What is a deployment 1- What it is Getting metric, scaling, 1- The purpose and setup of 1- How it works
2- How to use a deployment placement, have a good GKE is standard and you should
deployment, scaling, load 2- Difference from a ingress 2- Outside secrets
balancing, rollbacks) important for rollout and testing deployment general appreciation of this spend some time on it. All the
topics mention here can be
Key Points Video combined and referenced in term
RBAC 1- Difference between GKE workloads with work identity of uses and DevOps so don’t
Performing rolling updates deployment and replica set ignore them.
Configure workload Identity

Rolling deployments Blue/Green Canary Anthos Anthos congif Config connector Review documents
Red/Black management Rolling deployments
Blue/green
Canary
Anthos
What it is What it is What it is What it is What it is What it is
Anthos config management
Rolling update deployment, you Blue/green deployment A way of comparing a Anthos is Google's cloud- Create a common Open Source Kubernetes add-on
update a subset of running maintains two instances of a candidate version again a centric container platform for configuration across all your to manager Google Clouse Config connector
application instances instead of system: one that is serving baseline to check deviations in running modern apps infrastructure, including resources through GKE
Video
simultaneously updating every traffic (green), and another behaviour anywhere consistently at custom policies, and apply it
application instance that is ready to serve traffic scale. both on-premises and in the Deployments Canary
(blue). cloud. KRM Toolchain
What you should know What you should know What you should know What you should know What you should know What you should know Canary GKE
1- Problem that is solved 1- Difference from canary 1- When to use, how it works 1- How it works 1- How it works 1- What it is
2- How it works in GKE, MIG and how it’s deployed 2- Uses for DevOps 2- Components Policy My experience
(general knowledge) (important) controller, Config Sync, Config Deployment methods you should
controller know well, along with awareness
of Anthos congif manager
Performing rolling updates Node pool upgrade Key Points components
GKE strategies GKE 1- What you are comparing
again (current production
version)

12
BY AMMETT
DevOps on Google – key services
Cloud Source Repositories Cloud Build Cloud Build private Pools Cloud Composer Deployment Manager Cloud Run Review documents
Cloud Source repository
Cloud Composer
Deployment Manager
Modify IAM permission CB
What it is What it is What it is What it is What it is What it is Cloud Build pools
Cloud Source Repositories are Cloud Build executes your Private pools are private, managed workflow Deployment Manager is an Cloud Run is a managed compute Cloud Build
fully featured, build as a series of build dedicated pools of workers orchestration service, enabling infrastructure deployment platform that lets you run Cloud run Traffic migration
private Git repositories hosted on steps, where each build step that offer greater you to create, schedule, service that automates the containers directly on top of Video
Google Cloud. is run in a Docker container. customization including the monitor, and manage creation and management of Google's scalable infrastructure. Continuous integration for Cloud Run
ability to access resources in a workflow pipelines that span Google Cloud resources Point and click deployment Cloud Run
private network. across clouds and on-
premises data centers.
What you should know What you should know What you should know What you should know What you should know What you should know My experience
1- Integration with other GCP 1- Import source code from 1- How it works 1- Components and how it 1- When to use. 1- Ho to deploy These tools (Cloud Build, Cloud
tools. Cloud storage, Github, works. 2- Different templates. 2-Traffic migration, rollouts, Repositories, Cloud Composer,
Bitbucket etc 2- Manages across rollbacks Cloud Run, etc) will appear in
2- Produces artifacts (docker environments 3- Traffic splitting various ways in the exam.
or java) 3-Based on Apache airflow
Understand these well. And how
Video Key Points
Proper understanding of to manage and deploy
Cloud Composer for data 1- Config written in YAML
cloud build capabilities is orchestration must contain Name, Type,
important Properties
2- Templates written in python
or jinja2
DevOps on Google – key services
GitOps Artifact registry Container Registry App Engine Helm Charts Review documents
GitOps
Artifact registry
Container registry
App engine
What it is What it is What it is What it is What it is Work with helm charts
Is a concept using a Git repository Cloud Build executes your Container Registry is a private App Engine is a fully managed, A chart is a HELM packaging
to store the environment state build as a series of build container image registry that serverless platform for format. It is a collection of Video
that you want steps, where each build step runs on Google Cloud. developing and hosting web files that describe a related set CI/CD anywhere
is run in a Docker container. applications at scale of Kubernetes resources.
CI testing with Cloud Build
What you should know What you should know What you should know What you should know What you should know My experience
1- How to use GitOps style in 1- manage images in various 1- How it works 1- Understand it’s functions 1- OCI format These tools (GitOps Artifact
Google Cloud formats 2- Images (pulling and 2- Get current open Manage charts registry, AppEngine, Helm
2- OCI pushing) connections Charts) may appear in various
ways in the exam.

13
BY AMMETT
Canary Rollout
Canary using service label; selector

CI \ CD on Google Cloud

14
BY AMMETT
Performance
Speeding up builds AutoScaling MTTR Structure of metric types Troubleshooting agents Prometheus Managed My experience
Deployments GKE Service Now these are super helpful.
That’s all I am going to say

Video
What it is What it is What it is What it is What it is What it is Google Cloud managed service
Quick ways to improve This is a target value or MTTR, MTTD, MTBF etc Take some time to Things break and can give Prometheus is an open-source for Prometheus
deployment speed range of values for a understand observability problem check out these systems monitoring and Troubleshoot the Ops Agent
service level that is metric types troubleshooting tips for alerting toolkit
measured by an SLI. your ops agents
Review Documents Review Documents Review Documents Review Documents Review Documents Review Documents
Speeding up builds Autoscaling deployments MTTR Metric Types Troubleshooting Agents Prometheus Managed service

Thanks for reviewing

- Please visit the official certification outline HERE


- Practice test from Google HERE
ps. These are my notes and deep dive resources for the latest Cloud
DevOps exam. This updated exam is a tough exam. Every area on the
document represents a topic that has a strong probability of
appearing. Google may change the exam requirements at any time so
always review the outline.
The knowledge is free it just cost me a good bit of time to put
together. Please share with your network who may be interested in the
GCP Cloud DevOps Exam or just need a quick refresher on these
topics.

You can also check my all my prep notes for other Google Cloud Certs
exams HERE
If these help you give me a shout on LinkedIn.

Bonne Journée

15
BY AMMETT

You might also like