Professional Cloud DevOps Engineer - en

CERTIFICATIONS
We provide the latest IT certification practice exams in a variety of formats and

for all types of IT professionals. Our commitment to get you certified in the
shortest and easiest way is evident in the quality of our products.
Our state-of-the-art Test Engine Software simulates the real exam environment
and are available for: Windows (.EXE), Android App (.APK) and eReader (eBook)
formats. These questions and answers will help you pass your certification exam
on your first try or we refund your MONEY in full.
Xcerts Certifications
Sales@Xcerts.com | https://Xcerts.com
Google Professional-Cloud-DevOps-Engineer Exam
Question & Answers
Professional Cloud DevOps Engineer Exam
Product Questions: 162

Version: 8.0
QUESTION: 1
You manage an application that is writing logs to Stackdriver Logging. You need to give some
team members the ability to export logs.
What should you do?
A. Grant the team members the IAM role of logging.configWriter on Cloud IAM.
B. Configure Access Context Manager to allow only these members to export logs.
C. Create and grant a custom IAM role with the permissions logging.sinks.list and
logging.sink.get.
D. Create an Organizational Policy in Cloud IAM to allow only these members to create log
exports.
Answer(s): A
Explanation:
https://cloud.google.com/logging/docs/access-control
The logging.configWriter role grants permissions to create, update, and delete log exports. This
is the correct role to give team members who need to export logs.
QUESTION: 2
You support the backend of a mobile phone game that runs on a Google Kubernetes Engine
(GKE) cluster. The application is serving HTTP requests from users. You need to implement a
solution that will reduce the network cost.
What should you do?
A. Configure the VPC as a Shared VPC Host project.

B. Configure your network services on the Standard Tier.
C. Configure your Kubernetes duster as a Private Cluster.
D. Configure a Google Cloud HTTP Load Balancer as Ingress.
Answer(s): B
Explanation:
The Standard Tier network service offers lower network costs than the Premium Tier. This is the
correct option to reduce the network cost for the application.
QUESTION: 3
Your team has recently deployed an NGINX-based application into Google Kubernetes Engine
(GKE) and has exposed it to the public via an HTTP Google Cloud Load Balancer (GCLB)
ingress. You want to scale the deployment of the application's frontend using an appropriate
Service Level Indicator (SLI).
What should you do?
PROFESSIONAL-CLOUD-DEVOPS-ENGINEER
A. Configure the horizontal pod autoscaler to use the average response time from the Liveness
and Readiness probes.
B. Configure the vertical pod autoscaler in GKE and enable the cluster autoscaler to scale the
cluster as pods expand.
C. Install the Stackdriver custom metrics adapter and configure a horizontal pod autoscaler to
use the number of requests provided by the GCLB.
D. Expose the NGINX stats endpoint and configure the horizontal pod autoscaler to use the
request metrics exposed by the NGINX deployment.
Answer(s): C
Explanation:
https://cloud.google.com/kubernetes-engine/docs/tutorials/autoscaling-metrics The Google
Cloud HTTP Load Balancer (GCLB) provides metrics on the number of requests and the
response latency for each backend service. These metrics can be used as custom metrics for
the horizontal pod autoscaler (HPA) to scale the deployment based on the load. This is the
correct solution to use an appropriate SLI for scaling.
QUESTION: 4
Your company experiences bugs, outages, and slowness in its production systems. Developers
use the production environment for new feature development and bug fixes. Configuration and
experiments are done in the production environment, causing outages for users. Testers use
the production environment for load testing, which often slows the production systems. You
need to redesign the environment to reduce the number of bugs and outages in production and
to enable testers to load test new features.
What should you do?
A. Create an automated testing script in production to detect failures as soon as they occur.
B. Create a development environment with smaller server capacity and give access only to
developers and testers.
C. Secure the production environment to ensure that developers can't change it and set up one
controlled update per year.
D. Create a development environment for writing code and a test environment for
configurations, experiments, and load testing.
Answer(s): D
Explanation:
Creating a development environment for writing code and a test environment for configurations,
experiments, and load testing is the best practice to reduce the number of bugs and outages in
production and to enable testers to load test new features. This way, the production
environment is isolated from changes that could affect its stability and performance.
QUESTION: 5
Your company follows Site Reliability Engineering practices. You are the Incident Commander
for a new. customer-impacting incident. You need to immediately assign two incident
management roles to assist you in an effective incident response.
What roles should you assign? Choose 2 answers
A. Operations Lead
https://Xcerts.com 2
B. Engineering Lead
C. Communications Lead
D. Customer Impact Assessor
E. External Customer Communications Lead
Answer(s): A, C
Explanation:
https://sre.google/workbook/incident-response/
"The main roles in incident response are the Incident Commander (IC), Communications Lead
(CL), and Operations or Ops Lead (OL)."
The Operations Lead is responsible for managing the operational aspects of the incident, such
as deploying fixes, rolling back changes, or restoring backups. The External Customer
Communications Lead is not a standard role in incident response, but it could be delegated by
the Communications Lead if needed.
QUESTION: 6
Your company follows Site Reliability Engineering principles. You are writing a postmortem for
an incident, triggered by a software change, that severely affected users. You want to prevent
severe incidents from happening in the future.
What should you do?
A. Identify engineers responsible for the incident and escalate to their senior management.
B. Ensure that test cases that catch errors of this type are run successfully before new software
releases.
C. Follow up with the employees who reviewed the changes and prescribe practices they should
follow in the future.
D. Design a policy that will require on-call teams to immediately call engineers and management
to discuss a plan of action if an incident occurs.
Answer(s): B
Explanation:
The best way to prevent severe incidents from happening in the future is to ensure that test
cases that catch errors of this type are run successfully before new software releases. This is
aligned with the Site Reliability Engineering principle of testing for reliability.
QUESTION: 7
Your team is designing a new application for deployment both inside and outside Google Cloud
Platform (GCP). You need to collect detailed metrics such as system resource utilization. You
want to use centralized GCP services while minimizing the amount of work required to set up
this collection system.
What should you do?
A. Import the Stackdriver Profiler package, and configure it to relay function timing data to
Stackdriver for further analysis.
B. Import the Stackdriver Debugger package, and configure the application to emit debug
messages with timing information.
C. Instrument the code using a timing library, and publish the metrics via a health check
endpoint that is scraped by Stackdriver.
D. Install an Application Performance Monitoring (APM) tool in both locations, and configure an
export to a central data storage location for analysis.
Answer(s): A
Explanation:
The easiest way to collect detailed metrics such as system resource utilization is to import the
Stackdriver Profiler package, and configure it to relay function timing data to Stackdriver for
further analysis. This way, you can use centralized GCP services without modifying your code
or setting up additional tools.
QUESTION: 8
Your application images are built and pushed to Google Container Registry (GCR). You want to
build an automated pipeline that deploys the application when the image is updated while
minimizing the development effort.
What should you do?
A. Use Cloud Build to trigger a Spinnaker pipeline.

B. Use Cloud Pub/Sub to trigger a Spinnaker pipeline.
C. Use a custom builder in Cloud Build to trigger a Jenkins pipeline.
D. Use Cloud Pub/Sub to trigger a custom deployment service running in Google Kubernetes
Engine (GKE).
Answer(s): B
Explanation:
https://cloud.google.com/architecture/continuous-delivery-toolchain-spinnaker-cloud
https://spinnaker.io/guides/user/pipeline/triggers/pubsub/ The most efficient way to build an
automated pipeline that deploys the application when the image is updated is to use Cloud
Pub/Sub to trigger a Spinnaker pipeline. This way, you can leverage the built-in integration
between GCR and Cloud Pub/Sub, and use Spinnaker as a continuous delivery platform for
deploying your application .
QUESTION: 9
You support a high-traffic web application that runs on Google Cloud Platform (GCP). You need
to measure application reliability from a user perspective without making any engineering
changes to it.
What should you do?
Choose 2 answers
A. Review current application metrics and add new ones as needed.

B. Modify the code to capture additional information for user interaction.
C. Analyze the web proxy logs only and capture response time of each request.
D. Create new synthetic clients to simulate a user journey using the application.
E. Use current and historic Request Logs to trace customer interaction with the application.
Answer(s): D, E
Explanation:
The most effective ways to measure application reliability from a user perspective without
making any engineering changes are to create new synthetic clients to simulate a user journey
using the application, and to use current and historic Request Logs to trace customer interaction
with the application. These methods can help you monitor the availability, latency, and errors of
your application from an end-user perspective .
QUESTION: 10
You support an application deployed on Compute Engine. The application connects to a Cloud
SQL instance to store and retrieve dat
A. After an update to the application, users report errors showing database timeout messages.
The number of concurrent active users remained stable. You need to find the most probable
cause of the database timeout.
What should you do?
B. Check the serial port logs of the Compute Engine instance.
C. Use Stackdriver Profiler to visualize the resources utilization throughout the application.
D. Determine whether there is an increased number of connections to the Cloud SQL instance.
E. Use Cloud Security Scanner to see whether your Cloud SQL is under a Distributed Denial of
Service (DDoS) attack.
Answer(s): C
Explanation:
The most probable cause of the database timeout is an increased number of connections to the
Cloud SQL instance. This could happen if the application does not close connections properly or
if it creates too many connections at once. You can check the number of connections to the
Cloud SQL instance using Cloud Monitoring or Cloud SQL Admin API .
QUESTION: 11
Your organization recently adopted a container-based workflow for application development.
Your team develops numerous applications that are deployed continuously through an
automated build pipeline to a Kubernetes cluster in the production environment. The security
auditor is concerned that developers or operators could circumvent automated testing and push
code changes to production without approval.
What should you do to enforce approvals?
A. Configure the build system with protected branches that require pull request approval.
B. Use an Admission Controller to verify that incoming requests originate from approved
sources.
C. Leverage Kubernetes Role-Based Access Control (RBAC) to restrict access to only approved
users.
D. Enable binary authorization inside the Kubernetes cluster and configure the build pipeline as
an attestor.
Answer(s): D
Explanation:
The keywords here is "developers or operators". Option A the operators could push images to
production without approval (operators could touch the cluster directly and the cluster cannot do
any action against them). Rest same as francisco_guerra.
QUESTION: 12
You support an application running on App Engine. The application is used globally and
accessed from various device types. You want to know the number of connections. You are
using Stackdriver Monitoring for App Engine.

What metric should you use?
A. flex/connections/current
B. tcp_ssl_proxy/new_connections
C. tcp_ssl_proxy/open_connections
D. flex/instance/connections/current
Answer(s): A
Explanation:
https://cloud.google.com/monitoring/api/metrics_gcp#gcp-appengine
QUESTION: 13
You support a production service that runs on a single Compute Engine instance. You regularly
need to spend time on recreating the service by deleting the crashing instance and creating a
new instance based on the relevant image. You want to reduce the time spent performing
manual operations while following Site Reliability Engineering principles.
What should you do?
A. File a bug with the development team so they can find the root cause of the crashing
instance.
B. Create a Managed Instance Group with a single instance and use health checks to determine
the system status.
C. Add a Load Balancer in front of the Compute Engine instance and use health checks to
determine the system status.
D. Create a Stackdriver Monitoring dashboard with SMS alerts to be able to start recreating the
crashed instance promptly after it has crashed.
Answer(s): B
QUESTION: 14
You are managing the production deployment to a set of Google Kubernetes Engine (GKE)
clusters. You want to make sure only images which are successfully built by your trusted CI/CD
pipeline are deployed to production.
What should you do?
A. Enable Cloud Security Scanner on the clusters.

B. Enable Vulnerability Analysis on the Container Registry.
C. Set up the Kubernetes Engine clusters as private clusters.
D. Set up the Kubernetes Engine clusters with Binary Authorization.
Answer(s): D
Explanation:
https://cloud.google.com/binary-authorization/docs/overview
QUESTION: 15
You support a high-traffic web application with a microservice architecture. The home page of
the application displays multiple widgets containing content such as the current weather, stock
prices, and news headlines. The main serving thread makes a call to a dedicated microservice
for each widget and then lays out the homepage for the user. The microservices occasionally
fail; when that happens, the serving thread serves the homepage with some missing content.
Users of the application are unhappy if this degraded mode occurs too frequently, but they
would rather have some content served instead of no content at all. You want to set a Service
Level Objective (SLO) to ensure that the user experience does not degrade too much.
What Service Level Indicator {SLI) should you use to measure this?
A. A quality SLI: the ratio of non-degraded responses to total responses

B. An availability SLI: the ratio of healthy microservices to the total number of microservices
C. A freshness SLI: the proportion of widgets that have been updated within the last 10 minutes
D. A latency SLI: the ratio of microservice calls that complete in under 100 ms to the total
number of microservice calls
Answer(s): B
Explanation:
https://cloud.google.com/blog/products/gcp/available-or-not-that-is-the-question-cre-life-lessons
QUESTION: 16
You are running an application on Compute Engine and collecting logs through Stackdriver. You
discover that some personally identifiable information (Pll) is leaking into certain log entry fields.
All Pll entries begin with the text userinfo. You want to capture these log entries in a secure
location for later review and prevent them from leaking to Stackdriver Logging.
What should you do?
A. Create a basic log filter matching userinfo, and then configure a log export in the Stackdriver
console with Cloud Storage as a sink.
B. Use a Fluentd filter plugin with the Stackdriver Agent to remove log entries containing
userinfo, and then copy the entries to a Cloud Storage bucket.
C. Create an advanced log filter matching userinfo, configure a log export in the Stackdriver
console with Cloud Storage as a sink, and then configure a tog exclusion with userinfo as a
filter.
D. Use a Fluentd filter plugin with the Stackdriver Agent to remove log entries containing
userinfo, create an advanced log filter matching userinfo, and then configure a log export in the
Stackdriver console with Cloud Storage as a sink.
Answer(s): B
Explanation:
https://medium.com/google-cloud/fluentd-filter-plugin-for-google-cloud-data-loss-prevention-api-
42bbb1308e76
QUESTION: 17
Your team uses Cloud Build for all CI/CO pipelines. You want to use the kubectl builder for
Cloud Build to deploy new images to Google Kubernetes Engine (GKE). You need to
authenticate to GKE while minimizing development effort.
What should you do?
A. Assign the Container Developer role to the Cloud Build service account.
B. Specify the Container Developer role for Cloud Build in the cloudbuild.yaml file.
C. Create a new service account with the Container Developer role and use it to run Cloud
Build.
D. Create a separate step in Cloud Build to retrieve service account credentials and pass these
to kubectl.
Answer(s): A
Explanation:
https://cloud.google.com/build/docs/deploying-builds/deploy-gke
https://cloud.google.com/build/docs/securing-builds/configure-user-specified-service-accounts
QUESTION: 18
You need to reduce the cost of virtual machines (VM| for your organization. After reviewing
different options, you decide to leverage preemptible VM instances.
Which application is suitable for preemptible VMs?
A. A scalable in-memory caching system

B. The organization's public-facing website
C. A distributed, eventually consistent NoSQL database cluster with sufficient quorum
D. A GPU-accelerated video rendering platform that retrieves and stores videos in a storage
bucket
Answer(s): D
Explanation:
https://cloud.google.com/compute/docs/instances/preemptible
QUESTION: 19
You need to deploy a new service to production. The service needs to automatically scale using
a Managed Instance Group (MIG) and should be deployed over multiple regions. The service
needs a large number of resources for each instance and you need to plan for capacity.
What should you do?
A. Use the n1-highcpu-96 machine type in the configuration of the MIG.

B. Monitor results of Stackdriver Trace to determine the required amount of resources.
C. Validate that the resource requirements are within the available quota limits of each region.
D. Deploy the service in one region and use a global load balancer to route traffic to this region.
Answer(s): C
Explanation:
https://cloud.google.com/compute/quotas#understanding_quotas
https://cloud.google.com/compute/quotas
QUESTION: 20
You support an e-commerce application that runs on a large Google Kubernetes Engine (GKE)
cluster deployed on-premises and on Google Cloud Platform. The application consists of
microservices that run in containers. You want to identify containers that are using the most
CPU and memory.
What should you do?
A. Use Stackdriver Kubernetes Engine Monitoring.
B. Use Prometheus to collect and aggregate logs per container, and then analyze the results in
Grafana.
C. Use the Stackdriver Monitoring API to create custom metrics, and then organize your
containers using groups.
D. Use Stackdriver Logging to export application logs to BigOuery. aggregate logs per
container, and then analyze CPU and memory consumption.
Answer(s): A
Explanation:
https://cloud.google.com/anthos/clusters/docs/on-prem/1.7/concepts/logging-and-monitoring
QUESTION: 21
You are on-call for an infrastructure service that has a large number of dependent systems. You
receive an alert indicating that the service is failing to serve most of its requests and all of its
dependent systems with hundreds of thousands of users are affected. As part of your Site
Reliability Engineering (SRE) incident management protocol, you declare yourself Incident
Commander (IC) and pull in two experienced people from your team as Operations Lead (OLJ
and Communications Lead (CL).
What should you do next?
A. Look for ways to mitigate user impact and deploy the mitigations to production.
B. Contact the affected service owners and update them on the status of the incident.
C. Establish a communication channel where incident responders and leads can communicate
with each other.
D. Start a postmortem, add incident information, circulate the draft internally, and ask internal
stakeholders for input.
Answer(s): A
Explanation:
https://sre.google/sre-book/managing-incidents/
QUESTION: 22
Your product is currently deployed in three Google Cloud Platform (GCP) zones with your users
divided between the zones. You can fail over from one zone to another, but it causes a 10-
minute service disruption for the affected users. You typically experience a database failure
once per quarter and can detect it within five minutes. You are cataloging the reliability risks of a
new real-time chat feature for your product. You catalog the following information for each risk:
· Mean Time to Detect (MUD} in minutes
· Mean Time to Repair (MTTR) in minutes
· Mean Time Between Failure (MTBF) in days
· User Impact Percentage
The chat feature requires a new database system that takes twice as long to successfully fail
over between zones. You want to account for the risk of the new database failing in one zone.
What would be the values for the risk of database failover with the new system?
A. MTTD: 5
MTTR: 10
MTBF: 90
Impact: 33%
B. MTTD:5
MTTR: 20
MTBF: 90
Impact: 33%
C. MTTD:5
MTTR: 10
MTBF: 90
Impact 50%
D. MTTD:5
MTTR: 20
MTBF: 90
Impact: 50%
Answer(s): B
Explanation:
https://www.atlassian.com/incident-management/kpis/common-metrics
https://linkedin.github.io/school-of-sre/
QUESTION: 23
You are developing a strategy for monitoring your Google Cloud Platform (GCP) projects in
production using Stackdriver Workspaces. One of the requirements is to be able to quickly
identify and react to production environment issues without false alerts from development and
staging projects. You want to ensure that you adhere to the principle of least privilege when
providing relevant team members with access to Stackdriver Workspaces.
What should you do?
A. Grant relevant team members read access to all GCP production projects. Create
Stackdriver workspaces inside each project.
B. Grant relevant team members the Project Viewer IAM role on all GCP production projects.
Create Slackdriver workspaces inside each project.
C. Choose an existing GCP production project to host the monitoring workspace. Attach the
production projects to this workspace. Grant relevant team members read access to the
Stackdriver Workspace.
D. Create a new GCP monitoring project, and create a Stackdriver Workspace inside it. Attach
the production projects to this workspace. Grant relevant team members read access to the
Stackdriver Workspace.
Answer(s): D
Explanation:
"A Project can host many Projects and appear in many Projects, but it can only be used as the
scoping project once. We recommend that you create a new Project for the purpose of having
multiple Projects in the same scope."
QUESTION: 24
You support an application running on GCP and want to configure SMS notifications to your
team for the most critical alerts in Stackdriver Monitoring. You have already identified the
alerting policies you want to configure this for.
What should you do?
A. Download and configure a third-party integration between Stackdriver Monitoring and an

SMS
gateway. Ensure that your team members add their SMS/phone numbers to the external tool.
B. Select the Webhook notifications option for each alerting policy, and configure it to use a
third- party integration tool. Ensure that your team members add their SMS/phone numbers to
the external tool.
C. Ensure that your team members set their SMS/phone numbers in their Stackdriver Profile.
Select the SMS notification option for each alerting policy and then select the appropriate
SMS/phone numbers from the list.
D. Configure a Slack notification for each alerting policy. Set up a Slack-to-SMS integration to
send SMS messages when Slack messages are received. Ensure that your team members add
their SMS/phone numbers to the external integration.
Answer(s): C
Explanation:
https://cloud.google.com/monitoring/support/notification-options#creating_channels To configure
SMS notifications, do the following:
In the SMS section, click Add new and follow the instructions. Click Save.
When you set up your alerting policy, select the SMS notification type and choose a verified
phone number from the list.
QUESTION: 25
Your application artifacts are being built and deployed via a CI/CD pipeline. You want the CI/CD
pipeline to securely access application secrets. You also want to more easily rotate secrets in
case of a security breach.
What should you do?
A. Prompt developers for secrets at build time. Instruct developers to not store secrets at rest.
B. Store secrets in a separate configuration file on Git. Provide select developers with access to
the configuration file.
C. Store secrets in Cloud Storage encrypted with a key from Cloud KMS. Provide the CI/CD
pipeline with access to Cloud KMS via IAM.
D. Encrypt the secrets and store them in the source code repository. Store a decryption key in a
separate repository and grant your pipeline access to it
Answer(s): C
QUESTION: 26
You have a CI/CD pipeline that uses Cloud Build to build new Docker images and push them to
Docker Hub. You use Git for code versioning. After making a change in the Cloud Build YAML
configuration, you notice that no new artifacts are being built by the pipeline. You need to
resolve the issue following Site Reliability Engineering practices.
What should you do?
A. Disable the CI pipeline and revert to manually building and pushing the artifacts.
B. Change the CI pipeline to push the artifacts to Container Registry instead of Docker Hub.
C. Upload the configuration YAML file to Cloud Storage and use Error Reporting to identify and
fix the issue.
D. Run a Git compare between the previous and current Cloud Build Configuration files to find
and fix the bug.
Answer(s): D
Explanation:
"After making a change in the Cloud Build YAML configuration, you notice that no new artifacts
are being built by the pipeline"- means something wrong on the recent change not with the
image registry.
QUESTION: 27
You support an application that stores product information in cached memory. For every cache
miss, an entry is logged in Stackdriver Logging. You want to visualize how often a cache miss
happens over time.
What should you do?
A. Link Stackdriver Logging as a source in Google Data Studio. Filler (he logs on the cache
misses.
B. Configure Stackdriver Profiler to identify and visualize when the cache misses occur based
on the logs.
C. Create a logs-based metric in Stackdriver Logging and a dashboard for that metric in
Stackdriver Monitoring.
D. Configure BigOuery as a sink for Stackdriver Logging. Create a scheduled query to filter the
cache miss logs and write them to a separate table
Answer(s): C
Explanation:
https://cloud.google.com/logging/docs/logs-based-metrics#counter-metric
QUESTION: 28
You are part of an organization that follows SRE practices and principles. You are taking over
the management of a new service from the Development Team, and you conduct a Production
Readiness Review (PRR). After the PRR analysis phase, you determine that the service cannot
currently meet its Service Level Objectives (SLOs). You want to ensure that the service can
meet its SLOs in production.
What should you do next?
A. Adjust the SLO targets to be achievable by the service so you can bring it into production.
B. Notify the development team that they will have to provide production support for the service.
C. Identify recommended reliability improvements to the service to be completed before
handover.
D. Bring the service into production with no SLOs and build them when you have collected
operational data.
Answer(s): C
QUESTION: 29
You support a multi-region web service running on Google Kubernetes Engine (GKE) behind a
Global HTTP'S Cloud Load Balancer (CLB). For legacy reasons, user requests first go through
a third-party Content Delivery Network (CDN). which then routes traffic to the CLB. You have
already implemented an availability Service Level Indicator (SLI) at the CLB level. However, you
want to increase coverage in case of a potential load balancer misconfiguration. CDN failure, or
other global networking catastrophe.
Where should you measure this new SLI? Choose 2 answers
A. Your application servers' logs

B. Instrumentation coded directly in the client
C. Metrics exported from the application servers
D. GKE health checks for your application servers
E. A synthetic client that periodically sends simulated user requests
Answer(s): B, E
QUESTION: 30
Your application images are built using Cloud Build and pushed to Google Container Registry
(GCR). You want to be able to specify a particular version of your application for deployment
based on the release version tagged in source control.
What should you do when you push the image?
A. Reference the image digest in the source control tag.

B. Supply the source control tag as a parameter within the image name.
C. Use Cloud Build to include the release version tag in the application image.
D. Use GCR digest versioning to match the image to the tag in source control.
Answer(s): B
Explanation:
https://cloud.google.com/container-registry/docs/pushing-and-pulling
QUESTION: 31
You currently store the virtual machine (VM) utilization logs in Stackdriver. You need to provide
an easy-to-share interactive VM utilization dashboard that is updated in real time and contains
information aggregated on a quarterly basis. You want to use Google Cloud Platform solutions.
What should you do?
A. 1. Export VM utilization logs from Stackdriver to BigOuery.

2. Create a dashboard in Data Studio.
3. Share the dashboard with your stakeholders.
B. 1. Export VM utilization logs from Stackdriver to Cloud Pub/Sub.
2. From Cloud Pub/Sub, send the logs to a Security Information and Event Management (SIEM)
system.
3. Build the dashboards in the SIEM system and share with your stakeholders.
C. 1. Export VM utilization logs (rom Stackdriver to BigQuery.
2. From BigQuery. export the logs to a CSV file.
3. Import the CSV file into Google Sheets.
4. Build a dashboard in Google Sheets and share it with your stakeholders.
D. 1. Export VM utilization logs from Stackdriver to a Cloud Storage bucket.
2. Enable the Cloud Storage API to pull the logs programmatically.
3. Build a custom data visualization application.
4. Display the pulled logs in a custom dashboard.
Answer(s): A
QUESTION: 32
You are performing a semiannual capacity planning exercise for your flagship service. You
expect a service user growth rate of 10% month-over-month over the next six months. Your
service is fully containerized and runs on Google Cloud Platform (GCP). using a Google
Kubernetes Engine (GKE) Standard regional cluster on three zones with cluster autoscaler
enabled. You currently consume about 30% of your total deployed CPU capacity, and you
require resilience against the failure of a zone. You want to ensure that your users experience
minimal negative impact as a result of this growth or as a result of zone failure, while avoiding
unnecessary costs. How should you prepare to handle the predicted growth?
A. Verity the maximum node pool size, enable a horizontal pod autoscaler, and then perform a
load test to verity your expected resource needs.
B. Because you are deployed on GKE and are using a cluster autoscaler. your GKE cluster will
scale automatically, regardless of growth rate.
C. Because you are at only 30% utilization, you have significant headroom and you won't need
to add any additional capacity for this rate of growth.
D. Proactively add 60% more node capacity to account for six months of 10% growth rate, and
then perform a load test to make sure you have enough capacity.
Answer(s): A
Explanation:
https://cloud.google.com/kubernetes-engine/docs/concepts/horizontalpodautoscaler The
Horizontal Pod Autoscaler changes the shape of your Kubernetes workload by automatically
increasing or decreasing the number of Pods in response to the workload's CPU or memory
consumption
QUESTION: 33
You are managing an application that exposes an HTTP endpoint without using a load balancer.
The latency of the HTTP responses is important for the user experience. You want to
understand what
HTTP latencies all of your users are experiencing. You use Stackdriver Monitoring.
What should you do?
A. · In your application, create a metric with a metricKind set to DELTA and a valueType set to
DOUBLE.
· In Stackdriver's Metrics Explorer, use a Slacked Bar graph to visualize the metric.
B. · In your application, create a metric with a metricKind set to CUMULATIVE and a valueType
set to DOUBLE.
· In Stackdriver's Metrics Explorer, use a Line graph to visualize the metric.
C. · In your application, create a metric with a metricKind set to gauge and a valueType set to
distribution.
· In Stackdriver's Metrics Explorer, use a Heatmap graph to visualize the metric.
D. · In your application, create a metric with a metricKind. set
toMETRlc_KIND_UNSPECIFIEDanda valueType set to INT64.
· In Stackdriver's Metrics Explorer, use a Stacked Area graph to visualize the metric.
Answer(s): C
Explanation:
https://sre.google/workbook/implementing-slos/
https://cloud.google.com/architecture/adopting-slos/
Latency is commonly measured as a distribution. Given a distribution, you can measure various
percentiles. For example, you might measure the number of requests that are slower than the
historical 99th percentile.
QUESTION: 34
You are responsible for creating and modifying the Terraform templates that define your
Infrastructure. Because two new engineers will also be working on the same code, you need to
define a process and adopt a tool that will prevent you from overwriting each other's code. You
also want to ensure that you capture all updates in the latest version.
What should you do?
A. · Store your code in a Git-based version control system.

· Establish a process that allows developers to merge their own changes at the end of each day.
· Package and upload code lo a versioned Cloud Storage bucket as the latest master version.
B. · Store your code in a Git-based version control system.
· Establish a process that includes code reviews by peers and unit testing to ensure integrity
and functionality before integration of code.
· Establish a process where the fully integrated code in the repository becomes the latest
master version.
C. · Store your code as text files in Google Drive in a defined folder structure that organizes the
files.
· At the end of each day. confirm that all changes have been captured in the files within the
folder structure.
· Rename the folder structure with a predefined naming convention that increments the version.
D. · Store your code as text files in Google Drive in a defined folder structure that organizes the
files.
· At the end of each day, confirm that all changes have been captured in the files within the
folder structure and create a new .zip archive with a predefined naming convention.
· Upload the .zip archive to a versioned Cloud Storage bucket and accept it as the latest
version.
Answer(s): B
QUESTION: 35
You are running an experiment to see whether your users like a new feature of a web
application. Shortly after deploying the feature as a canary release, you receive a spike in the
number of 500 errors sent to users, and your monitoring reports show increased latency. You
want to quickly minimize the negative impact on users.
What should you do first?
A. Roll back the experimental canary release.

B. Start monitoring latency, traffic, errors, and saturation.
C. Record data for the postmortem document of the incident.
D. Trace the origin of 500 errors and the root cause of increased latency.
Answer(s): A
QUESTION: 36
You need to run a business-critical workload on a fixed set of Compute Engine instances for
several months. The workload is stable with the exact amount of resources allocated to it. You
want to lower the costs for this workload without any performance implications.
What should you do?
A. Purchase Committed Use Discounts.

B. Migrate the instances to a Managed Instance Group.
C. Convert the instances to preemptible virtual machines.
D. Create an Unmanaged Instance Group for the instances used to run the workload.
Answer(s): A
QUESTION: 37
You encountered a major service outage that affected all users of the service for multiple hours.
After several hours of incident management, the service returned to normal, and user access
was restored. You need to provide an incident summary to relevant stakeholders following the
Site Reliability Engineering recommended practices.
A. Call individual stakeholders lo explain what happened.

B. Develop a post-mortem to be distributed to stakeholders.
C. Send the Incident State Document to all the stakeholders.
D. Require the engineer responsible to write an apology email to all stakeholders.
Answer(s): B
QUESTION: 38
Your company follows Site Reliability Engineering practices. You are the person in charge of
Communications for a large, ongoing incident affecting your customer-facing applications. There
is still no estimated time for a resolution of the outage. You are receiving emails from internal
stakeholders who want updates on the outage, as well as emails from customers who want to
know what is happening. You want to efficiently provide updates to everyone affected by the
outage.
What should you do?
A. Focus on responding to internal stakeholders at least every 30 minutes. Commit to "next

update" times.
B. Provide periodic updates to all stakeholders in a timely manner. Commit to a "next update"
time in all communications.
C. Delegate the responding to internal stakeholder emails to another member of the Incident
Response Team. Focus on providing responses directly to customers.
D. Provide all internal stakeholder emails to the Incident Commander, and allow them to
manage internal communications. Focus on providing responses directly to customers.
Answer(s): B
Explanation:
When disaster strikes, the person who declares the incident typically steps into the IC role and
directs the high-level state of the incident. The IC concentrates on the 3Cs and does the
following:
Commands and coordinates the incident response, delegating roles as needed. By default, the
IC assumes all roles that have not been delegated yet. Communicates effectively. Stays in
control of the incident response. Works with other responders to resolve the incident.
https://sre.google/workbook/incident-response/
QUESTION: 39
Your team is designing a new application for deployment into Google Kubernetes Engine
(GKE). You need to set up monitoring to collect and aggregate various application-level metrics
in a centralized location. You want to use Google Cloud Platform services while minimizing the
amount of work required to set up monitoring.
What should you do?
A. Publish various metrics from the application directly to the Slackdriver Monitoring API, and
then observe these custom metrics in Stackdriver.
B. Install the Cloud Pub/Sub client libraries, push various metrics from the application to various
topics, and then observe the aggregated metrics in Stackdriver.
C. Install the OpenTelemetry client libraries in the application, configure Stackdriver as the
export destination for the metrics, and then observe the application's metrics in Stackdriver.
D. Emit all metrics in the form of application-specific log messages, pass these messages from
the containers to the Stackdriver logging collector, and then observe metrics in Stackdriver.
Answer(s): A
Explanation:
https://cloud.google.com/kubernetes-engine/docs/concepts/custom-and-external-
metrics#custom_metrics https://github.com/GoogleCloudPlatform/k8s-
stackdriver/blob/master/custom-metrics-stackdriver- adapter/README.md
Your application can report a custom metric to Cloud Monitoring. You can configure Kubernetes
to respond to these metrics and scale your workload automatically. For example, you can scale
your application based on metrics such as queries per second, writes per second, network
performance, latency when communicating with a different application, or other metrics that
make sense for your workload. https://cloud.google.com/kubernetes-
engine/docs/concepts/custom-and-external- metrics
QUESTION: 40
Your application services run in Google Kubernetes Engine (GKE). You want to make sure that
only images from your centrally-managed Google Container Registry (GCR) image registry in
the altostrat- images project can be deployed to the cluster while minimizing development time.
What should you do?
A. Create a custom builder for Cloud Build that will only push images to gcr.io/altostrat-images.
B. Use a Binary Authorization policy that includes the whitelist name pattern gcr.io/attostrat-
images/.
C. Add logic to the deployment pipeline to check that all manifests contain only images from
gcr.io/altostrat-images.
D. Add a tag to each image in gcr.io/altostrat-images and check that this tag is present when the
image is deployed.
Answer(s): B
QUESTION: 41
You are running an application in a virtual machine (VM) using a custom Debian image. The
image has the Stackdriver Logging agent installed. The VM has the cloud-platform scope. The
application is logging information via syslog. You want to use Stackdriver Logging in the Google
Cloud Platform Console to visualize the logs. You notice that syslog is not showing up in the "All
logs" dropdown list of the Logs Viewer.
What is the first thing you should do?
A. Look for the agent's test log entry in the Logs Viewer.
B. Install the most recent version of the Stackdriver agent.
C. Verify the VM service account access scope includes the monitoring.write scope.
D. SSH to the VM and execute the following commands on your VM: ps ax I grep fluentd
Answer(s): D
Explanation:
https://cloud.google.com/compute/docs/access/service-
accounts#associating_a_service_account_to_an_instance
QUESTION: 42
Your organization wants to implement Site Reliability Engineering (SRE) culture and principles.
Recently, a service that you support had a limited outage. A manager on another team asks you
to provide a formal explanation of what happened so they can action remediations.
What should you do?
A. Develop a postmortem that includes the root causes, resolution, lessons learned, and a
prioritized list of action items. Share it with the manager only.
B. Develop a postmortem that includes the root causes, resolution, lessons learned, and a
prioritized list of action items. Share it on the engineering organization's document portal.
C. Develop a postmortem that includes the root causes, resolution, lessons learned, the list of
people responsible, and a list of action items for each person. Share it with the manager only.
D. Develop a postmortem that includes the root causes, resolution, lessons learned, the list of
people responsible, and a list of action items for each person. Share it on the engineering
organization's document portal.
Answer(s): B
QUESTION: 43
You have a set of applications running on a Google Kubernetes Engine (GKE) cluster, and you
are using Stackdriver Kubernetes Engine Monitoring. You are bringing a new containerized
application required by your company into production. This application is written by a third party
and cannot be modified or reconfigured. The application writes its log information to
/var/log/app_messages.log, and you want to send these log entries to Stackdriver Logging.
What should you do?
A. Use the default Stackdriver Kubernetes Engine Monitoring agent configuration.

B. Deploy a Fluentd daemonset to GKE. Then create a customized input and output
configuration to tail the log file in the application's pods and write to Slackdriver Logging.
C. Install Kubernetes on Google Compute Engine (GCE> and redeploy your applications. Then
customize the built-in Stackdriver Logging configuration to tail the log file in the application's
pods and write to Stackdriver Logging.
D. Write a script to tail the log file within the pod and write entries to standard output. Run the
script as a sidecar container with the application's pod. Configure a shared volume between the
containers to allow the script to have read access to /var/log in the application container.
Answer(s): B
Explanation:
https://cloud.google.com/architecture/customizing-stackdriver-logs-fluentd
Besides the list of default logs that the Logging agent streams by default, you can customize the
Logging agent to send additional logs to Logging or to adjust agent settings by adding input
configurations. The configuration definitions in these sections apply to the fluent-plugin-google-
cloud output plugin only and specify how logs are transformed and ingested into Cloud Logging.
https://cloud.google.com/logging/docs/agent/logging/configuration#configure
QUESTION: 44
You are running a real-time gaming application on Compute Engine that has a production and
testing environment. Each environment has their own Virtual Private Cloud (VPC) network. The
application frontend and backend servers are located on different subnets in the environment's
VPC. You suspect there is a malicious process communicating intermittently in your production
frontend servers. You want to ensure that network traffic is captured for analysis.
What should you do?
A. Enable VPC Flow Logs on the production VPC network frontend and backend subnets only
with a sample volume scale of 0.5.
B. Enable VPC Flow Logs on the production VPC network frontend and backend subnets only
with a sample volume scale of 1.0.
C. Enable VPC Flow Logs on the testing and production VPC network frontend and backend
subnets with a volume scale of 0.5. Apply changes in testing before production.
D. Enable VPC Flow Logs on the testing and production VPC network frontend and backend
subnets with a volume scale of 1.0. Apply changes in testing before production.
Answer(s): D
QUESTION: 45
You support a high-traffic web application and want to ensure that the home page loads in a
timely manner. As a first step, you decide to implement a Service Level Indicator (SLI) to
represent home page request latency with an acceptable page load time set to 100 ms.
What is the Google- recommended way of calculating this SLI?
A. Buckelize Ihe request latencies into ranges, and then compute the percentile at 100 ms.
B. Bucketize the request latencies into ranges, and then compute the median and 90th
percentiles.
C. Count the number of home page requests that load in under 100 ms, and then divide by the
total number of home page requests.
D. Count the number of home page requests that load in under 100 ms. and then divide by the
total number of all web application requests.
Answer(s): C
Explanation:
In the SRE principles book, it's recommended treating the SLI as the ratio of two numbers: the
number of good events divided by the total number of events. For example: Number of
successful
HTTP requests / total HTTP requests (success rate)
QUESTION: 46
You use a multiple step Cloud Build pipeline to build and deploy your application to Google
Kubernetes Engine (GKE). You want to integrate with a third-party monitoring platform by
performing a HTTP POST of the build information to a webhook. You want to minimize the
development effort.
What should you do?
A. Add logic to each Cloud Build step to HTTP POST the build information to a webhook.
B. Add a new step at the end of the pipeline in Cloud Build to HTTP POST the build information
to a webhook.
C. Use Stackdriver Logging to create a logs-based metric from the Cloud Buitd logs. Create an
Alert with a Webhook notification type.
D. Create a Cloud Pub/Sub push subscription to the Cloud Build cloud-builds PubSub topic to
HTTP POST the build information to a webhook.
Answer(s): D
QUESTION: 47
You created a Stackdriver chart for CPU utilization in a dashboard within your workspace
project. You want to share the chart with your Site Reliability Engineering (SRE) team only. You
want to ensure you follow the principle of least privilege.
What should you do?
A. Share the workspace Project ID with the SRE team. Assign the SRE team the Monitoring
Viewer IAM role in the workspace project.
B. Share the workspace Project ID with the SRE team. Assign the SRE team the Dashboard
Viewer IAM role in the workspace project.
C. Click "Share chart by URL" and provide the URL to the SRE team. Assign the SRE team the
Monitoring Viewer IAM role in the workspace project.
D. Click "Share chart by URL" and provide the URL to the SRE team. Assign the SRE team the
Dashboard Viewer IAM role in the workspace project.
Answer(s): C
Explanation:
https://cloud.google.com/monitoring/access-control
QUESTION: 48
You support a stateless web-based API that is deployed on a single Compute Engine instance
in the europe-west2-a zone . The Service Level Indicator (SLI) for service availability is below
the specified Service Level Objective (SLO). A postmortem has revealed that requests to the
API regularly time out. The time outs are due to the API having a high number of requests and
running out memory. You want to improve service availability.

What should you do?
A. Change the specified SLO to match the measured SLI.

B. Move the service to higher-specification compute instances with more memory.
C. Set up additional service instances in other zones and load balance the traffic between all
instances.
D. Set up additional service instances in other zones and use them as a failover in case the
primary instance is unavailable.
Answer(s): C
QUESTION: 49
You deploy a new release of an internal application during a weekend maintenance window
when there is minimal user traffic. After the window ends, you learn that one of the new features
isn't working as expected in the production environment. After an extended outage, you roll
back the new release and deploy a fix. You want to modify your release process to reduce the
mean time to recovery so you can avoid extended outages in the future.
What should you do? Choose 2 answers
A. Before merging new code, require 2 different peers to review the code changes.
B. Adopt the blue/green deployment strategy when releasing new code via a CD server.
C. Integrate a code linting tool to validate coding standards before any code is accepted into the
repository.
D. Require developers to run automated integration tests on their local development
environments before release.
E. Configure a CI server. Add a suite of unit tests to your code and have your CI server run
them on commit and verify any changes.
Answer(s): B, E
QUESTION: 50
You have a pool of application servers running on Compute Engine. You need to provide a
secure solution that requires the least amount of configuration and allows developers to easily
access application logs for troubleshooting. How would you implement the solution on GCP?
A. · Deploy the Stackdriver logging agent to the application servers.

· Give the developers the IAM Logs Viewer role to access Stackdriver and view logs.
B. · Deploy the Stackdriver logging agent to the application servers.
· Give the developers the IAM Logs Private Logs Viewer role to access Stackdriver and view
logs.
C. · Deploy the Stackdriver monitoring agent to the application servers.
· Give the developers the IAM Monitoring Viewer role to access Stackdriver and view metrics.
D. · Install the gsutil command line tool on your application servers.
· Write a script using gsutil to upload your application log to a Cloud Storage bucket, and then
schedule it to run via cron every 5 minutes.
· Give the developers IAM Object Viewer access to view the logs in the specified bucket.
Answer(s): A
Explanation:
https://cloud.google.com/logging/docs/audit#access-control
QUESTION: 51
You support a Node.js application running on Google Kubernetes Engine (GKE) in production.
The application makes several HTTP requests to dependent applications. You want to
anticipate which dependent applications might cause performance issues.
What should you do?
A. Instrument all applications with Stackdriver Profiler.

B. Instrument all applications with Stackdriver Trace and review inter-service HTTP requests.
C. Use Stackdriver Debugger to review the execution of logic within each application to
instrument all applications.
D. Modify the Node.js application to log HTTP request and response times to dependent
applications. Use Stackdriver Logging to find dependent applications that are performing poorly.
Answer(s): B
QUESTION: 52
You use Spinnaker to deploy your application and have created a canary deployment stage in
the pipeline. Your application has an in-memory cache that loads objects at start time. You want
to automate the comparison of the canary version against the production version. How should
you configure the canary analysis?
A. Compare the canary with a new deployment of the current production version.
B. Compare the canary with a new deployment of the previous production version.
C. Compare the canary with the existing deployment of the current production version.
D. Compare the canary with the average performance of a sliding window of previous
production versions.
Answer(s): A
Explanation:
https://cloud.google.com/architecture/automated-canary-analysis-kubernetes-engine-spinnaker
https://spinnaker.io/guides/user/canary/best-practices/#compare-canary-against-baseline-not-
against-production
QUESTION: 53
Your team of Infrastructure DevOps Engineers is growing, and you are starting to use Terraform
to manage infrastructure. You need a way to implement code versioning and to share code with
other team members.
What should you do?
A. Store the Terraform code in a version-control system. Establish procedures for pushing new
versions and merging with the master.
B. Store the Terraform code in a network shared folder with child folders for each version
release.
Ensure that everyone works on different files.
C. Store the Terraform code in a Cloud Storage bucket using object versioning. Give access to
the bucket to every team member so they can download the files.
D. Store the Terraform code in a shared Google Drive folder so it syncs automatically to every
team member's computer. Organize files with a naming convention that identifies each new
version.
Answer(s): A
Explanation:
https://www.terraform.io/docs/cloud/guides/recommended-practices/part3.3.html
QUESTION: 54
You are using Stackdriver to monitor applications hosted on Google Cloud Platform (GCP). You
recently deployed a new application, but its logs are not appearing on the Stackdriver
dashboard. You need to troubleshoot the issue.
What should you do?
A. Confirm that the Stackdriver agent has been installed in the hosting virtual machine.
B. Confirm that your account has the proper permissions to use the Stackdriver dashboard.
C. Confirm that port 25 has been opened in the firewall to allow messages through to
Stackdriver.
D. Confirm that the application is using the required client library and the service account key
has proper permissions.
Answer(s): A
Explanation:
https://cloud.google.com/monitoring/agent/monitoring/troubleshooting#checklist
QUESTION: 55
Your organization recently adopted a container-based workflow for application development.
Your team develops numerous applications that are deployed continuously through an
automated build pipeline to the production environment. A recent security audit alerted your
team that the code pushed to production could contain vulnerabilities and that the existing
tooling around virtual machine (VM) vulnerabilities no longer applies to the containerized
environment. You need to ensure the security and patch level of all code running through the
pipeline.
What should you do?
A. Set up Container Analysis to scan and report Common Vulnerabilities and Exposures.
B. Configure the containers in the build pipeline to always update themselves before release.
C. Reconfigure the existing operating system vulnerability software to exist inside the container.
D. Implement static code analysis tooling against the Docker files used to create the containers.
Answer(s): D
Explanation:
https://cloud.google.com/binary-authorization
Binary Authorization is a deploy-time security control that ensures only trusted container images
are deployed on Google Kubernetes Engine (GKE) or Cloud Run. With Binary Authorization,
you can require images to be signed by trusted authorities during the development process and
then enforce signature validation when deploying. By enforcing validation, you can gain tighter
control over your container environment by ensuring only verified images are integrated into the
build-and-release process.
QUESTION: 56
You use Cloud Build to build your application. You want to reduce the build time while
minimizing cost and development effort.
What should you do?
A. Use Cloud Storage to cache intermediate artifacts.

B. Run multiple Jenkins agents to parallelize the build.
C. Use multiple smaller build steps to minimize execution time.
D. Use larger Cloud Build virtual machines (VMs) by using the machine-type option.
Answer(s): C
Explanation:
https://cloud.google.com/storage/docs/best-practices
. https://cloud.google.com/build/docs/speeding-up-
builds#caching_directories_with_google_cloud_storage
Caching directories with Google Cloud Storage To increase the speed of a build, reuse the
results from a previous build. You can copy the results of a previous build to a Google Cloud
Storage bucket, use the results for faster calculation, and then copy the new results back to the
bucket. Use this method when your build takes a long time and produces a small number of files
that does not take time to copy to and from Google Cloud Storage.
upvoted 2 times
QUESTION: 57
You support a web application that is hosted on Compute Engine. The application provides a
booking service for thousands of users. Shortly after the release of a new feature, your
monitoring dashboard shows that all users are experiencing latency at login. You want to
mitigate the impact of the incident on the users of your service.
A. Roll back the recent release.

B. Review the Stackdriver monitoring.
C. Upsize the virtual machines running the login services.
D. Deploy a new release to see whether it fixes the problem.
Answer(s): C
Explanation:
Rollback to previous stable version. Then you need to find what is causing the issue.
QUESTION: 58
You are deploying an application that needs to access sensitive information. You need to
ensure that this information is encrypted and the risk of exposure is minimal if a breach occurs.
What should you do?
A. Store the encryption keys in Cloud Key Management Service (KMS) and rotate the keys
frequently
B. Inject the secret at the time of instance creation via an encrypted configuration management
system.
C. Integrate the application with a Single sign-on (SSO) system and do not expose secrets to
the application
D. Leverage a continuous build pipeline that produces multiple versions of the secret for each
instance of the application.
Answer(s): A
Explanation:
https://cloud.google.com/security-key-management
QUESTION: 59
You encounter a large number of outages in the production systems you support. You receive
alerts for all the outages that wake you up at night. The alerts are due to unhealthy systems that
are automatically restarted within a minute. You want to set up a process that would prevent
staff burnout while following Site Reliability Engineering practices.
What should you do?
A. Eliminate unactionable alerts.

B. Create an incident report for each of the alerts.
C. Distribute the alerts to engineers in different time zones.
D. Redefine the related Service Level Objective so that the error budget is not exhausted.
Answer(s): A
Explanation:
Eliminate bad monitoring : Unactionable alerts (i.e., spam)
https://cloud.google.com/blog/products/management-tools/meeting-reliability-challenges-with-
sre-principles agree with kyubiblaze about having to remove unactionable items aka spam:
"good monitoring alerts on actionable problems" @
https://cloud.google.com/blog/products/management-tools/meeting- reliability-challenges-with-
sre-principles
QUESTION: 60
You have migrated an e-commerce application to Google Cloud Platform (GCP). You want to
prepare the application for the upcoming busy season.
What should you do first to prepare for the busy season?
A. Load teat the application to profile its performance for scaling.

B. Enable AutoScaling on the production clusters, in case there is growth.
C. Pre-provision double the compute power used last season, expecting growth.
D. Create a runbook on inflating the disaster recovery (DR) environment if there is growth.
Answer(s): A
Explanation:
https://cloud.google.com/blog/topics/retail/preparing-for-peak-holiday-season-while-wfh
QUESTION: 61
You support a web application that runs on App Engine and uses CloudSQL and Cloud Storage
for data storage. After a short spike in website traffic, you notice a big increase in latency for all
user requests, increase in CPU use, and the number of processes running the application. Initial
troubleshooting reveals:
After the initial spike in traffic, load levels returned to normal but users still experience high
latency. Requests for content from the CloudSQL database and images from Cloud Storage
show the same high latency.
No changes were made to the website around the time the latency increased. There is no
increase in the number of errors to the users. You expect another spike in website traffic in the
coming days and want to make sure users don't experience latency.
What should you do?
A. Upgrade the GCS buckets to Multi-Regional.

B. Enable high availability on the CloudSQL instances.
C. Move the application from App Engine to Compute Engine.
D. Modify the App Engine configuration to have additional idle instances.
Answer(s): D
Explanation:
Scaling App Engine scales the number of instances automatically in response to processing
volume. This scaling factors in the automatic_scaling settings that are provided on a per-version
basis in the configuration file. A service with basic scaling is configured by setting the maximum
number of instances in the max_instances parameter of the basic_scaling setting. The number
of live instances scales with the processing volume. You configure the number of instances of
each version in that service's configuration file. The number of instances usually corresponds to
the size of a dataset being held in memory or the desired throughput for offline work. You can
adjust the number of instances of a manually-scaled version very quickly, without stopping
instances that are currently running, using the Modules API set_num_instances function.
https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed
https://cloud.google.com/appengine/docs/standard/python/config/appref max_idle_instances
Optional. The maximum number of idle instances that App Engine should maintain for this
version. Specify a value from 1 to 1000. If not specified, the default value is automatic, which
means App Engine will manage the number of idle instances. Keep the following in mind: A high
maximum reduces the number of idle instances more gradually when load levels return to
normal after a spike. This helps your application maintain steady performance through
fluctuations in request load, but also raises the number of idle instances (and consequent
running costs) during such periods of heavy load.
QUESTION: 62
Your application runs on Google Cloud Platform (GCP). You need to implement Jenkins for
deploying application releases to GCP. You want to streamline the release process, lower
operational toil, and keep user data secure.
What should you do?
A. Implement Jenkins on local workstations.

B. Implement Jenkins on Kubernetes on-premises
C. Implement Jenkins on Google Cloud Functions.
D. Implement Jenkins on Compute Engine virtual machines.
Answer(s): D
Explanation:
Your application runs on Google Cloud Platform (GCP). You need to implement Jenkins for
deploying application releases to GCP. You want to streamline the release process, lower
operational toil, and keep user data secure.
What should you do?
https://plugins.jenkins.io/google-compute-engine/
QUESTION: 63
You are working with a government agency that requires you to archive application logs for
seven years. You need to configure Stackdriver to export and store the logs while minimizing
costs of storage.
What should you do?
A. Create a Cloud Storage bucket and develop your application to send logs directly to the
bucket.
B. Develop an App Engine application that pulls the logs from Stackdriver and saves them in
BigQuery.
C. Create an export in Stackdriver and configure Cloud Pub/Sub to store logs in permanent
storage for seven years.
D. Create a sink in Stackdriver, name it, create a bucket on Cloud Storage for storing archived
logs, and then select the bucket as the log export destination.
Answer(s): D
Explanation:
https://cloud.google.com/logging/docs/routing/overview
QUESTION: 64
You support a trading application written in Python and hosted on App Engine flexible
environment. You want to customize the error information being sent to Stackdriver Error
Reporting.
What should you do?
A. Install the Stackdriver Error Reporting library for Python, and then run your code on a
Compute Engine VM.
B. Install the Stackdriver Error Reporting library for Python, and then run your code on Google
Kubernetes Engine.
C. Install the Stackdriver Error Reporting library for Python, and then run your code on App
Engine flexible environment.
D. Use the Stackdriver Error Reporting API to write errors from your application to
ReportedErrorEvent, and then generate log entries with properly formatted error messages in
Stackdriver Logging.
Answer(s): D
Explanation:
https://cloud.google.com/error-reporting/docs/formatting-error-messages
https://cloud.google.com/error-reporting/docs/reference/libraries#client-libraries-install-python no
need to install error reporting library on App Engine Flex.
QUESTION: 65
You need to define Service Level Objectives (SLOs) for a high-traffic multi-region web
application. Customers expect the application to always be available and have fast response
times. Customers are currently happy with the application performance and availability. Based
on current measurement, you observe that the 90th percentile of latency is 120ms and the 95th
percentile of latency is 275ms over a 28-day window.
What latency SLO would you recommend to the team to publish?
A. 90th percentile - 100ms

95th percentile - 250ms
B. 90th percentile - 120ms
C. 90th percentile - 150ms
D. 90th percentile - 250ms
Answer(s): C
Explanation:
https://sre.google/sre-book/service-level-objectives/
QUESTION: 66
You support a large service with a well-defined Service Level Objective (SLO). The
development team deploys new releases of the service multiple times a week. If a major
incident causes the service to miss its SLO, you want the development team to shift its focus
from working on features to improving service reliability.
What should you do before a major incident occurs?
A. Develop an appropriate error budget policy in cooperation with all service stakeholders.
B. Negotiate with the product team to always prioritize service reliability over releasing new
features.
C. Negotiate with the development team to reduce the release frequency to no more than once
a week.
D. Add a plugin to your Jenkins pipeline that prevents new releases whenever your service is
out of SLO.
Answer(s): A
Explanation:
Reason : Incident has not occurred yet, even when development team is already pushing new
features multiple times a week. The option A says, to define an error budget "policy", not to
define error budget(It is already present). Just simple means to bring in all stakeholders, and
decide how to consume the error budget effectively that could bring balance between feature
deployment and reliability.
The goals of this policy are to: -- Protect customers from repeated SLO misses -- Provide an
incentive to balance reliability with other features https://sre.google/workbook/error-budget-
policy/
QUESTION: 67
Your company is developing applications that are deployed on Google Kubernetes Engine
(GKE). Each team manages a different application. You need to create the development and
production environments for each team, while minimizing costs. Different teams should not be
able to access other teams' environments.
What should you do?
A. Create one GCP Project per team. In each project, create a cluster for Development and one
for Production. Grant the teams IAM access to their respective clusters.
B. Create one GCP Project per team. In each project, create a cluster with a Kubernetes
namespace for Development and one for Production. Grant the teams IAM access to their
respective clusters.
C. Create a Development and a Production GKE cluster in separate projects. In each cluster,
create a Kubernetes namespace per team, and then configure Identity Aware Proxy so that
each team can only access its own namespace.
D. Create a Development and a Production GKE cluster in separate projects. In each cluster,
create a Kubernetes namespace per team, and then configure Kubernetes Role-based access
control (RBAC) so that each team can only access its own namespace.
Answer(s): D
Explanation:
https://cloud.google.com/architecture/prep-kubernetes-engine-for-prod#roles_and_groups
QUESTION: 68
Some of your production services are running in Google Kubernetes Engine (GKE) in the eu-
west-1 region. Your build system runs in the us-west-1 region. You want to push the container
images from your build system to a scalable registry to maximize the bandwidth for transferring
the images to the cluster.
What should you do?
A. Push the images to Google Container Registry (GCR) using the gcr.io hostname.
B. Push the images to Google Container Registry (GCR) using the us.gcr.io hostname.
C. Push the images to Google Container Registry (GCR) using the eu.gcr.io hostname.
D. Push the images to a private image registry running on a Compute Engine instance in the eu-
west- 1 region.
Answer(s): C
Explanation:
Hostname Storage location gcr.io Stores images in data centers in the United States asia.gcr.io
Stores images in data centers in Asia eu.gcr.io Stores images in data centers within member
states of the European Union us.gcr.io Stores images in data centers in the United States
QUESTION: 69
You manage several production systems that run on Compute Engine in the same Google
Cloud Platform (GCP) project. Each system has its own set of dedicated Compute Engine
instances. You want to know how must it costs to run each of the systems.
What should you do?
A. In the Google Cloud Platform Console, use the Cost Breakdown section to visualize the costs
per system.
B. Assign all instances a label specific to the system they run. Configure BigQuery billing export
and query costs per label.
C. Enrich all instances with metadata specific to the system they run. Configure Stackdriver
Logging to export to BigQuery, and query costs based on the metadata.
D. Name each virtual machine (VM) after the system it runs. Set up a usage report export to a
Cloud Storage bucket. Configure the bucket as a source in BigQuery to query costs based on
VM name.
Answer(s): B
Explanation:
https://cloud.google.com/billing/docs/how-to/export-data-bigquery
QUESTION: 70
You use Cloud Build to build and deploy your application. You want to securely incorporate
database credentials and other application secrets into the build pipeline. You also want to
minimize the development effort.
What should you do?
A. Create a Cloud Storage bucket and use the built-in encryption at rest. Store the secrets in the
bucket and grant Cloud Build access to the bucket.
B. Encrypt the secrets and store them in the application repository. Store a decryption key in a
separate repository and grant Cloud Build access to the repository.
C. Use client-side encryption to encrypt the secrets and store them in a Cloud Storage bucket.
Store a decryption key in the bucket and grant Cloud Build access to the bucket.
D. Use Cloud Key Management Service (Cloud KMS) to encrypt the secrets and include them in
your Cloud Build deployment configuration. Grant Cloud Build access to the KeyRing.
Answer(s): D
Explanation:
https://cloud.google.com/build/docs/securing-builds/use-encrypted-credentials
QUESTION: 71
You support a popular mobile game application deployed on Google Kubernetes Engine (GKE)
across several Google Cloud regions. Each region has multiple Kubernetes clusters. You
receive a report that none of the users in a specific region can connect to the application. You
want to resolve the incident while following Site Reliability Engineering practices.
A. Reroute the user traffic from the affected region to other regions that don't report issues.
B. Use Stackdriver Monitoring to check for a spike in CPU or memory usage for the affected
region.
C. Add an extra node pool that consists of high memory and high CPU machine type instances
to the cluster.
D. Use Stackdriver Logging to filter on the clusters in the affected region, and inspect error
messages in the logs.
Answer(s): A
Explanation:
Google always aims to first stop the impact of an incident, and then find the root cause (unless
the root cause just happens to be identified early on).
QUESTION: 72
You are writing a postmortem for an incident that severely affected users. You want to prevent
similar incidents in the future.
Which two of the following sections should you include in the postmortem? (Choose two.)
A. An explanation of the root cause of the incident

B. A list of employees responsible for causing the incident
C. A list of action items to prevent a recurrence of the incident
D. Your opinion of the incident's severity compared to past incidents
E. Copies of the design documents for all the services impacted by the incident
Answer(s): A, C
Explanation:
For a postmortem to be truly blameless, it must focus on identifying the contributing causes of
the incident without indicting any individual or team for bad or inappropriate behavior.
QUESTION: 73
You are ready to deploy a new feature of a web-based application to production. You want to
use Google Kubernetes Engine (GKE) to perform a phased rollout to half of the web server
pods.
What should you do?
A. Use a partitioned rolling update.

B. Use Node taints with NoExecute.
C. Use a replica set in the deployment specification.
D. Use a stateful set with parallel pod management policy.
Answer(s): A
Explanation:
https://medium.com/velotio-perspectives/exploring-upgrade-strategies-for-stateful-sets-in-
kubernetes-c02b8286f251
QUESTION: 74
You are responsible for the reliability of a high-volume enterprise application. A large number of
users report that an important subset of the application's functionality - a data intensive
reporting feature - is consistently failing with an HTTP 500 error.
When you investigate your application's dashboards, you notice a strong correlation between
the failures and a metric that represents the size of an internal queue used for generating
reports. You trace the failures to a reporting backend that is experiencing high I/O wait times.
You quickly fix the issue by resizing the backend's persistent disk (PD). How you need to create
an availability Service Level Indicator (SLI) for the report generation feature. How would you
define it?
A. As the I/O wait times aggregated across all report generation backends
B. As the proportion of report generation requests that result in a successful response
C. As the application's report generation queue size compared to a known-good threshold
D. As the reporting backend PD throughout capacity compared to a known-good threshold
Answer(s): B
Explanation:
According to SRE Workbook, one of potential SLI is as below:
* Type of service: Request-driven
* Type of SLI: Availability
* Description: The proportion of requests that resulted in a successful response.
QUESTION: 75
You have an application running in Google Kubernetes Engine. The application invokes multiple
services per request but responds too slowly. You need to identify which downstream service or
services are causing the delay.
What should you do?
A. Analyze VPC flow logs along the path of the request.

B. Investigate the Liveness and Readiness probes for each service.
C. Create a Dataflow pipeline to analyze service metrics in real time.
D. Use a distributed tracing framework such as OpenTelemetry or Stackdriver Trace.
Answer(s): C
QUESTION: 76
You are creating and assigning action items in a postmodern for an outage. The outage is over,
but you need to address the root causes. You want to ensure that your team handles the action
items quickly and efficiently. How should you assign owners and collaborators to action items?
A. Assign one owner for each action item and any necessary collaborators.
B. Assign multiple owners for each item to guarantee that the team addresses items quickly
C. Assign collaborators but no individual owners to the items to keep the postmortem
blameless.
D. Assign the team lead as the owner for all action items because they are in charge of the SRE
team.
Answer(s): A
Explanation:
https://devops.com/when-it-disaster-strikes-part-3-conducting-a-blameless-post-mortem/
QUESTION: 77
Your development team has created a new version of their service's API. You need to deploy
the new versions of the API with the least disruption to third-party developers and end users of
third-party installed applications.
What should you do?
A. Introduce the new version of the API.

Announce deprecation of the old version of the API.
Deprecate the old version of the API.
Contact remaining users of the old API.
Provide best effort support to users of the old API.
Turn down the old version of the API.
B.Announce deprecation of the old version of the API.

Introduce the new version of the API.
Contact remaining users on the old API.
C. Announce deprecation of the old version of the API.
Contact remaining users on the old API.
Introduce the new version of the API.
D. Introduce the new version of the API.
Contact remaining users of the old API.
Announce deprecation of the old version of the API.
Answer(s): A
QUESTION: 78
You are running an application on Compute Engine and collecting logs through Stackdriver. You
discover that some personally identifiable information (PII) is leaking into certain log entry fields.
You want to prevent these fields from being written in new log entries as quickly as possible.
What should you do?
A. Use the filter-record-transformer Fluentd filter plugin to remove the fields from the log entries
in flight.
B. Use the fluent-plugin-record-reformer Fluentd output plugin to remove the fields from the log
entries in flight.
C. Wait for the application developers to patch the application, and then verify that the log
entries are no longer exposing PII.
D. Stage log entries to Cloud Storage, and then trigger a Cloud Function to remove the fields
and write the entries to Stackdriver via the Stackdriver Logging API.
Answer(s): A
QUESTION: 79
You support a service that recently had an outage. The outage was caused by a new release
that exhausted the service memory resources. You rolled back the release successfully to
mitigate the impact on users. You are now in charge of the post-mortem for the outage. You
want to follow Site Reliability Engineering practices when developing the post-mortem.
What should you do?
A. Focus on developing new features rather than avoiding the outages from recurring.
B. Focus on identifying the contributing causes of the incident rather than the individual
responsible for the cause.
C. Plan individual meetings with all the engineers involved. Determine who approved and
pushed the new release to production.
D. Use the Git history to find the related code commit. Prevent the engineer who made that
commit from working on production services.
Answer(s): B
QUESTION: 80
You support a user-facing web application.
When analyzing the application's error budget over the previous six months, you notice that the
application has never consumed more than 5% of its error budget in any given time window.
You hold a Service Level Objective (SLO) review with business stakeholders and confirm that
the SLO is set appropriately. You want your application's SLO to more closely reflect its
observed reliability.
What steps can you take to further that goal while balancing velocity, reliability, and business
needs? (Choose two.)
A. Add more serving capacity to all of your application's zones.

B. Have more frequent or potentially risky application releases.
C. Tighten the SLO match the application's observed reliability.
D. Implement and measure additional Service Level Indicators (SLIs) fro the application.
E. Announce planned downtime to consume more error budget, and ensure that users are not
depending on a tighter SLO.
Answer(s): D, E
Explanation:
https://sre.google/sre-book/service-level-objectives/
You want the application's SLO to more closely reflect it's observed reliability. The key here is
error budget never goes over 5%. This means they can have additional downtime and still stay
within their budget.
QUESTION: 81
You support a service with a well-defined Service Level Objective (SLO). Over the previous 6
months, your service has consistently met its SLO and customer satisfaction has been
consistently high. Most of your service's operations tasks are automated and few repetitive
tasks occur frequently. You want to optimize the balance between reliability and deployment
velocity while following site reliability engineering best practices.
What should you do? (Choose two.)
A. Make the service's SLO more strict.

B. Increase the service's deployment velocity and/or risk.
C. Shift engineering time to other services that need more reliability.
D. Get the product team to prioritize reliability work over new features.
E. Change the implementation of your Service Level Indicators (SLIs) to increase coverage.
Answer(s): B, C
Explanation:
(https://sre.google/workbook/implementing-slos/#slo-decision-matrix)
QUESTION: 82
Your organization uses a change advisory board (CAB) to approve all changes to an existing
service You want to revise this process to eliminate any negative impact on the software
delivery performance What should you do?

Choose 2 answers
A. Replace the CAB with a senior manager to ensure continuous oversight from development to
deployment
B. Let developers merge their own changes but ensure that the team's deployment platform can
roll back changes if any issues are discovered
C. Move to a peer-review based process for individual changes that is enforced at code check-
in time and supported by automated tests
D. Batch changes into larger but less frequent software releases
E. Ensure that the team's development platform enables developers to get fast feedback on the
impact of their changes
Answer(s): C, E
Explanation:
A change advisory board (CAB) is a traditional way of approving changes to a service, but it can
slow down the software delivery performance and introduce bottlenecks. A better way to
improve the speed and quality of changes is to use a peer-review based process for individual
changes that is enforced at code check-in time and supported by automated tests. This way,
developers can get fast feedback on the impact of their changes and catch any errors or bugs
before they reach production. Additionally, the team's development platform should enable
developers to get fast feedback on the impact of their changes, such as using Cloud Code,
Cloud Build, or Cloud Debugger.
QUESTION: 83
Your organization has a containerized web application that runs on-premises As part of the
migration plan to Google Cloud you need to select a deployment strategy and platform that
meets the following acceptance criteria
1 The platform must be able to direct traffic from Android devices to an Android-specific
microservice
2 The platform must allow for arbitrary percentage-based traffic splitting 3 The deployment
strategy must allow for continuous testing of multiple versions of any microservice
What should you do?
A. Deploy the canary release of the application to Cloud Run Use traffic splitting to direct 10% of
user traffic to the canary release based on the revision tag
B. Deploy the canary release of the application to App Engine Use traffic splitting to direct a
subset of user traffic to the new version based on the IP address
C. Deploy the canary release of the application to Compute Engine Use Anthos Service Mesh
with Compute Engine to direct 10% of user traffic to the canary release by configuring the virtual
service.
D. Deploy the canary release to Google Kubernetes Engine with Anthos Sen/ice Mesh Use
traffic splitting to direct 10% of user traffic to the new version based on the user-agent header
configured in the virtual service
Answer(s): D
Explanation:
The best option for deploying a containerized web application to Google Cloud with the given
acceptance criteria is to use Google Kubernetes Engine (GKE) with Anthos Service Mesh. GKE
is a managed service for running Kubernetes clusters on Google Cloud, and Anthos Service
Mesh is a service mesh that provides observability, traffic management, and security features
for microservices. With Anthos Service Mesh, you can use traffic splitting to direct traffic from
Android devices to an Android-specific microservice by configuring the user-agent header in the
virtual service. You can also use traffic splitting to direct arbitrary percentage-based traffic to
different versions of any microservice for continuous testing. For example, you can use a canary
release strategy to direct 10% of user traffic to a new version of a microservice and monitor its
performance and reliability.
QUESTION: 84
Your team is running microservices in Google Kubernetes Engine (GKE) You want to detect
consumption of an error budget to protect customers and define release policies What should
you do?
A. Create SLIs from metrics Enable Alert Policies if the services do not pass
B. Use the metrics from Anthos Service Mesh to measure the health of the microservices
C. Create a SLO Create an Alert Policy on select_slo_bum_rate
D. Create a SLO and configure uptime checks for your services Enable Alert Policies if the
services do not pass
Answer(s): C
Explanation:
The best option for detecting consumption of an error budget to protect customers and define
release policies is to create a service level objective (SLO) and create an alert policy on
select_slo_burn_rate. A SLO is a target value or range of values for a service level indicator
(SLI) that measures some aspect of the service quality, such as availability or latency. An error
budget is the amount of time or number of errors that a service can tolerate while still meeting
its SLO. A select_slo_burn_rate is a metric that indicates how fast the error budget is being
consumed by the service. By creating an alert policy on select_slo_burn_rate, you can trigger
notifications or actions when the error budget consumption exceeds a certain threshold. This
way, you can balance change, velocity, and reliability of the service by adjusting the release
policies based on the error budget status.
QUESTION: 85
Your organization wants to collect system logs that will be used to generate dashboards in
Cloud Operations for their Google Cloud project. You need to configure all current and future
Compute Engine instances to collect the system logs and you must ensure that the Ops Agent
remains up to date.
What should you do?
A. Use the gcloud CLI to install the Ops Agent on each VM listed in the Cloud Asset Inventory
B. Select all VMs with an Agent status of Not detected on the Cloud Operations VMs dashboard
Then select Install agents
C. Use the gcloud CLI to create an Agent Policy.
D. Install the Ops Agent on the Compute Engine image by using a startup script
Answer(s): C
Explanation:
The best option for configuring all current and future Compute Engine instances to collect
system logs and ensure that the Ops Agent remains up to date is to use the gcloud CLI to
create an Agent Policy. An Agent Policy is a resource that defines how Ops Agents are installed
and configured on VM instances that match certain criteria, such as labels or zones. Ops
Agents are software agents that collect metrics and logs from VM instances and send them to
Cloud Operations products, such as Cloud Monitoring and Cloud Logging. By creating an Agent
Policy, you can ensure that all current and future VM instances that match the policy criteria will
have the Ops Agent installed and updated automatically. This way, you can collect system logs
from all VM instances and use them to generate dashboards in Cloud Operations.
QUESTION: 86
Your company has a Google Cloud resource hierarchy with folders for production test and
development Your cyber security team needs to review your company's Google Cloud security
posture to accelerate security issue identification and resolution You need to centralize the logs
generated by Google Cloud services from all projects only inside your production folder to allow
for alerting and near-real time analysis.
What should you do?
A. Enable the Workflows API and route all the logs to Cloud Logging
B. Create a central Cloud Monitoring workspace and attach all related projects
C. Create an aggregated log sink associated with the production folder that uses a Pub Sub
topic as the destination
D. Create an aggregated log sink associated with the production folder that uses a Cloud
Logging bucket as the destination
Answer(s): D
Explanation:
The best option for centralizing the logs generated by Google Cloud services from all projects
only inside your production folder is to create an aggregated log sink associated with the
production folder that uses a Cloud Logging bucket as the destination. An aggregated log sink is
a log sink that collects logs from multiple sources, such as projects, folders, or organizations. A
Cloud Logging bucket is a storage location for logs that can be used as a destination for log
sinks. By creating an aggregated log sink with a Cloud Logging bucket, you can collect and
store all the logs from the production folder in one place and allow for alerting and near-real time
analysis using Cloud Monitoring and Cloud Operations.
QUESTION: 87
You are configuring the frontend tier of an application deployed in Google Cloud The frontend
tier is hosted in ngmx and deployed using a managed instance group with an Envoy-based
external HTTP(S) load balancer in front The application is deployed entirely within the europe-
west2 region: and only serves users based in the United Kingdom. You need to choose the
most cost-effective network tier and load balancing configuration What should you use?
A. Premium Tier with a global load balancer

B. Premium Tier with a regional load balancer
C. Standard Tier with a global load balancer
D. Standard Tier with a regional load balancer
Answer(s): B
Explanation:
The most cost-effective network tier and load balancing configuration for your frontend tier is to
use Premium Tier with a regional load balancer. Premium Tier is a network tier that provides
high- performance and low-latency network connectivity across Google's global network. A
regional load balancer is a load balancer that distributes traffic within a single region. Since your
application is deployed entirely within the europe-west2 region and only serves users based in
the United Kingdom, you can use Premium Tier with a regional load balancer to optimize the
network performance and cost.
QUESTION: 88
You recently deployed your application in Google Kubernetes Engine (GKE) and now need to
release a new version of the application You need the ability to instantly roll back to the previous
version of the application in case there are issues with the new version Which deployment
model should you use?
A. Perform a rolling deployment and test your new application after the deployment is complete
B. Perform A. B testing, and test your application periodically after the deployment is complete
C. Perform a canary deployment, and test your new application periodically after the new
version is deployed
D. Perform a blue/green deployment and test your new application after the deployment is
complete
Answer(s): D
Explanation:
The best deployment model for releasing a new version of your application in GKE with the
ability to instantly roll back to the previous version is to perform a blue/green deployment and
test your new application after the deployment is complete. A blue/green deployment is a
deployment strategy that involves creating two identical environments, one running the current
version of the application (blue) and one running the new version of the application (green). The
traffic is switched from blue to green after testing the new version, and if any issues are
discovered, the traffic can be switched back to blue instantly. This way, you can minimize
downtime and risk during deployment.
QUESTION: 89
You are building and deploying a microservice on Cloud Run for your organization Your service
is used by many applications internally You are deploying a new release, and you need to test
the new version extensively in the staging and production environments You must minimize
user and developer impact.
What should you do?
A. Deploy the new version of the service to the staging environment Split the traffic, and allow 1
% of traffic through to the latest version Test the latest version If the test passes gradually roll
out the latest version to the staging and production environments
B. Deploy the new version of the service to the staging environment Split the traffic, and allow
50% of traffic through to the latest version Test the latest version If the test passes, send all
traffic to the latest version Repeat for the production environment
C. Deploy the new version of the service to the staging environment with a new-release tag
without serving traffic Test the new-release version If the test passes; gradually roll out this
tagged version Repeat for the production environment
D. Deploy a new environment with the green tag to use as the staging environment Deploy the
new version of the service to the green environment and test the new version If the tests pass,
send all traffic to the green environment and delete the existing staging environment Repeat for
the production environment
Answer(s): C
Explanation:
The best option for deploying a new release of your microservice on Cloud Run and testing it
extensively in the staging and production environments with minimal user and developer impact
is to deploy the new version of the service to the staging environment with a new-release tag
without serving traffic, test the new-release version, and if the test passes, gradually roll out this
tagged version. A tag is a label that you can assign to a revision of your service on Cloud Run.
You can use tags to create different versions of your service without affecting traffic. You can
also use tags to gradually roll out traffic to a new version of your service by using traffic splitting.
This way, you can test your new release extensively in both environments and minimize user
and developer impact.
QUESTION: 90
You work for a global organization and run a service with an availability target of 99% with
limited engineering resources. For the current calendar month you noticed that the service has
99 5% availability. You must ensure that your service meets the defined availability goals and
can react to business changes including the upcoming launch of new features You also need to
reduce technical debt while minimizing operational costs You want to follow Google-
recommended practices What should you do?
A. Add N+1 redundancy to your service by adding additional compute resources to the service
B. Identify, measure and eliminate toil by automating repetitive tasks
C. Define an error budget for your service level availability and minimize the remaining error
budget
D. Allocate available engineers to the feature backlog while you ensure that the sen/ice remains
within the availability target
Answer(s): C
QUESTION: 91
You are developing the deployment and testing strategies for your CI/CD pipeline in Google
Cloud You must be able to
· Reduce the complexity of release deployments and minimize the duration of deployment
rollbacks
· Test real production traffic with a gradual increase in the number of affected users You want to
select a deployment and testing strategy that meets your requirements What should you do?
A. Recreate deployment and canary testing

B. Blue/green deployment and canary testing
C. Rolling update deployment and A/B testing
D. Rolling update deployment and shadow testing
Answer(s): B
Explanation:
The best option for selecting a deployment and testing strategy that meets your requirements is
to use blue/green deployment and canary testing. A blue/green deployment is a deployment
strategy that involves creating two identical environments, one running the current version of the
application (blue) and one running the new version of the application (green). The traffic is
switched from blue to green after testing the new version, and if any issues are discovered, the
traffic can be switched back to blue instantly. This way, you can reduce the complexity of
release deployments and minimize the duration of deployment rollbacks. A canary testing is a
testing strategy that involves releasing a new version of an application to a subset of users or
servers and monitoring its performance and reliability. This way, you can test real production
traffic with a gradual increase in the number of affected users.
QUESTION: 92
You are creating a CI/CD pipeline to perform Terraform deployments of Google Cloud resources
Your
CI/CD tooling is running in Google Kubernetes Engine (GKE) and uses an ephemeral Pod for
each pipeline run You must ensure that the pipelines that run in the Pods have the appropriate
Identity and Access Management (1AM) permissions to perform the Terraform deployments
You want to follow Google-recommended practices for identity management What should you
do? Choose 2 answers
A. Create a new Kubernetes service account, and assign the service account to the Pods Use
Workload Identity to authenticate as the Google service account
B. Create a new JSON service account key for the Google service account store the key as a
Kubernetes secret, inject the key into the Pods, and set the boogle_application_credentials
environment variable
C. Create a new Google service account, and assign the appropriate 1AM permissions
D. Create a new JSON service account key for the Google service account store the key in the
secret management store for the CI/CD tool and configure Terraform to use this key for
authentication
E. Assign the appropriate 1AM permissions to the Google service account associated with the
Compute Engine VM instances that run the Pods
Answer(s): A, C
Explanation:
The best options for ensuring that the pipelines that run in the Pods have the appropriate IAM
permissions to perform the Terraform deployments are to create a new Kubernetes service
account and assign the service account to the Pods, and to use Workload Identity to
authenticate as the Google service account. A Kubernetes service account is an identity that
represents an application or a process running in a Pod. A Google service account is an identity
that represents a Google Cloud resource or service. Workload Identity is a feature that allows
you to bind Kubernetes service accounts to Google service accounts. By using Workload
Identity, you can avoid creating and managing JSON service account keys, which are less
secure and require more maintenance. You can also assign the appropriate IAM permissions to
the Google service account that corresponds to the Kubernetes service account.
QUESTION: 93
You are the on-call Site Reliability Engineer for a microservice that is deployed to a Google
Kubernetes Engine (GKE) Autopilot cluster. Your company runs an online store that publishes
order messages to Pub/Sub and a microservice receives these messages and updates stock
information in the warehousing system. A sales event caused an increase in orders, and the
stock information is not being updated quickly enough. This is causing a large number of orders
to be accepted for products that are out of stock You check the metrics for the microservice and
compare them to typical levels.
You need to ensure that the warehouse system accurately reflects product inventory at the time
orders are placed and minimize the impact on customers What should you do?
A. Decrease the acknowledgment deadline on the subscription

B. Add a virtual queue to the online store that allows typical traffic levels
C. Increase the number of Pod replicas
D. Increase the Pod CPU and memory limits
Answer(s): C
Explanation:
The best option for ensuring that the warehouse system accurately reflects product inventory at
the time orders are placed and minimizing the impact on customers is to increase the number of
Pod replicas. Increasing the number of Pod replicas will increase the scalability and availability
of your microservice, which will allow it to handle more Pub/Sub messages and update stock
information faster. This way, you can reduce the backlog of undelivered messages and oldest
unacknowledged message age, which are causing delays in updating product inventory. You
can use Horizontal Pod Autoscaler or Cloud Monitoring metrics-based autoscaling to
automatically adjust the number of Pod replicas based on load or custom metrics.
QUESTION: 94
Your team deploys applications to three Google Kubernetes Engine (GKE) environments
development staging and production You use GitHub reposrtones as your source of truth You
need to ensure that the three environments are consistent You want to follow Google-
recommended practices to enforce and install network policies and a logging DaemonSet on all
the GKE clusters in those environments What should you do?
A. Use Google Cloud Deploy to deploy the network policies and the DaemonSet Use Cloud
Monitoring to trigger an alert if the network policies and DaemonSet drift from your source in the
repository.
B. Use Google Cloud Deploy to deploy the DaemonSet and use Policy Controller to configure
the network policies Use Cloud Monitoring to detect drifts from the source in the repository and
Cloud Functions to correct the drifts
C. Use Cloud Build to render and deploy the network policies and the DaemonSet Set up Config
Sync to sync the configurations for the three environments
D. Use Cloud Build to render and deploy the network policies and the DaemonSet Set up a
Policy Controller to enforce the configurations for the three environments
Answer(s): C
Explanation:
The best option for ensuring that the three environments are consistent and following Google-
recommended practices is to use Cloud Build to render and deploy the network policies and the
DaemonSet, and set up Config Sync to sync the configurations for the three environments.
Cloud Build is a service that executes your builds on Google Cloud infrastructure. You can use
Cloud Build to render and deploy your network policies and DaemonSet as code using tools like
Kustomize, Helm,
or kpt. Config Sync is a feature that enables you to manage the configurations of your GKE
clusters from a single source of truth, such as a Git repository. You can use Config Sync to sync
the configurations for your development, staging, and production environments and ensure that
they are consistent.
QUESTION: 95
You are using Terraform to manage infrastructure as code within a Cl/CD pipeline You notice
that multiple copies of the entire infrastructure stack exist in your Google Cloud project, and a
new copy is created each time a change to the existing infrastructure is made You need to
optimize your cloud spend by ensuring that only a single instance of your infrastructure stack
exists at a time. You want to follow Google-recommended practices What should you do?
A. Create a new pipeline to delete old infrastructure stacks when they are no longer needed
B. Confirm that the pipeline is storing and retrieving the terraform. if state file from Cloud
Storage with the Terraform gcs backend
C. Verify that the pipeline is storing and retrieving the terrafom.tfstat* file from a source control
D. Update the pipeline to remove any existing infrastructure before you apply the latest
configuration
Answer(s): B
Explanation:
The best option for optimizing your cloud spend by ensuring that only a single instance of your
infrastructure stack exists at a time is to confirm that the pipeline is storing and retrieving the
terraform.tfstate file from Cloud Storage with the Terraform gcs backend. The terraform.tfstate
file is a file that Terraform uses to store the current state of your infrastructure. The Terraform
gcs backend is a backend type that allows you to store the terraform.tfstate file in a Cloud
Storage bucket. By using the Terraform gcs backend, you can ensure that your pipeline has
access to the latest state of your infrastructure and avoid creating multiple copies of the entire
infrastructure stack.
QUESTION: 96
You are creating Cloud Logging sinks to export log entries from Cloud Logging to BigQuery for
future analysis Your organization has a Google Cloud folder named Dev that contains
development projects and a folder named Prod that contains production projects Log entries for
development projects must be exported to dev_dataset. and log entries for production projects
must be exported to prod_dataset You need to minimize the number of log sinks created and
you want to ensure that the log sinks apply to future projects What should you do?
A. Create a single aggregated log sink at the organization level.

B. Create a log sink in each project
C. Create two aggregated log sinks at the organization level, and filter by project ID
D. Create an aggregated Iog sink in the Dev and Prod folders
Answer(s): D
Explanation:
The best option for minimizing the number of log sinks created and ensuring that the log sinks
apply to future projects is to create an aggregated log sink in the Dev and Prod folders. An
aggregated log sink is a log sink that collects logs from multiple sources, such as projects,
folders, or organizations. By creating an aggregated log sink in each folder, you can export log
entries for development projects to dev_dataset and log entries for production projects to
prod_dataset. You can also use filters to specify which logs you want to export. Additionally, by
creating an aggregated log sink at the folder level, you can ensure that the log sink applies to
future projects that are created under that folder.
QUESTION: 97
Your company runs services by using multiple globally distributed Google Kubernetes Engine
(GKE) clusters Your operations team has set up workload monitoring that uses Prometheus-
based tooling for metrics alerts: and generating dashboards This setup does not provide a
method to view metrics globally across all clusters You need to implement a scalable solution to
support global Prometheus querying and minimize management overhead What should you do?
A. Configure Prometheus cross-service federation for centralized data access

B. Configure workload metrics within Cloud Operations for GKE
C. Configure Prometheus hierarchical federation for centralized data access
D. Configure Google Cloud Managed Service for Prometheus
Answer(s): D
Explanation:
The best option for implementing a scalable solution to support global Prometheus querying and
minimize management overhead is to use Google Cloud Managed Service for Prometheus.
Google Cloud Managed Service for Prometheus is a fully managed service that allows you to
collect, query, and visualize metrics from your GKE clusters using Prometheus-based tooling.
You can use Google Cloud Managed Service for Prometheus to query metrics across multiple
clusters and regions using a global view. You can also use Google Cloud Managed Service for
Prometheus to integrate with other Google Cloud services, such as Cloud Monitoring, Cloud
Logging, and BigQuery. By using Google Cloud Managed Service for Prometheus, you can
avoid managing and scaling your own Prometheus servers and focus on your application
performance.
QUESTION: 98
You need to build a CI/CD pipeline for a containerized application in Google Cloud Your
development team uses a central Git repository for trunk-based development You want to run all
your tests in the pipeline for any new versions of the application to improve the quality What
should you do?
A. 1. Install a Git hook to require developers to run unit tests before pushing the code to a
central repository
2. Trigger Cloud Build to build the application container Deploy the application container to a
testing environment, and run integration tests
3. If the integration tests are successful deploy the application container to your production
environment. and run acceptance tests

B.1. Install a Git hook to require developers to run unit tests before pushing the code to a central
repository
If all tests are successful build a container
2. Trigger Cloud Build to deploy the application container to a testing environment, and run
integration tests and acceptance tests
3. If all tests are successful tag the code as production ready Trigger Cloud Build to build and
deploy the application container to the production environment
C. 1. Trigger Cloud Build to build the application container and run unit tests with the container
2. If unit tests are successful, deploy the application container to a testing environment, and run
integration tests
3. If the integration tests are successful the pipeline deploys the application container to the
production environment After that, run acceptance tests
D. 1. Trigger Cloud Build to run unit tests when the code is pushed If all unit tests are
successful, build and push the application container to a central registry.
2. Trigger Cloud Build to deploy the container to a testing environment, and run integration tests
and acceptance tests
3. If all tests are successful the pipeline deploys the application to the production environment
and runs smoke tests
Answer(s): D
Explanation:
The best option for building a CI/CD pipeline for a containerized application in Google Cloud is
to trigger Cloud Build to run unit tests when the code is pushed, if all unit tests are successful,
build and push the application container to a central registry, trigger Cloud Build to deploy the
container to a testing environment, and run integration tests and acceptance tests, and if all
tests are successful, the pipeline deploys the application to the production environment and
runs smoke tests. This option follows the best practices for CI/CD pipelines, such as running
tests at different stages of the pipeline, using a central registry for storing and managing
containers, deploying to different environments, and using Cloud Build as a unified tool for
building, testing, and deploying.
QUESTION: 99
Your company is developing applications that are deployed on Google Kubernetes Engine
(GKE) Each team manages a different application You need to create the development and
production environments for each team while you minimize costs Different teams should not be
able to access other teams environments You want to follow Google-recommended practices
What should you do?
A. Create one Google Cloud project per team In each project create a cluster for development
and one for production Grant the teams Identity and Access Management (1AM) access to their
respective clusters
B. Create one Google Cloud project per team In each project create a cluster with a Kubernetes
namespace for development and one for production Grant the teams Identity and Access
Management (1AM) access to their respective clusters.
C. Create a development and a production GKE cluster in separate projects In each cluster
create a Kubernetes namespace per team and then configure Identity-Aware Proxy so that each
team can only access its own namespace
D. Create a development and a production GKE cluster in separate projects In each cluster
create a Kubernetes namespace per team and then configure Kubernetes role-based access
control (RBAC) so that each team can only access its own namespace
Answer(s): D
Explanation:
The best option for creating the development and production environments for each team while
minimizing costs and ensuring isolation is to create a development and a production GKE
cluster in separate projects, in each cluster create a Kubernetes namespace per team, and then
configure Kubernetes role-based access control (RBAC) so that each team can only access its
own namespace. This option allows you to use fewer clusters and projects than creating one
project or cluster per team, which reduces costs and complexity. It also allows you to isolate
each team's environment by using namespaces and RBAC, which prevents teams from
accessing other teams' environments.
QUESTION: 100
The new version of your containerized application has been tested and is ready to be deployed
to production on Google Kubernetes Engine (GKE) You could not fully load-test the new version
in your pre-production environment and you need to ensure that the application does not have
performance problems after deployment Your deployment must be automated What should you
do?
A. Deploy the application through a continuous delivery pipeline by using canary deployments
Use Cloud Monitoring to look for performance issues, and ramp up traffic as supported by the
metrics
B. Deploy the application through a continuous delivery pipeline by using blue/green
deployments Migrate traffic to the new version of the application and use Cloud Monitoring to
look for performance issues
C. Deploy the application by using kubectl and use Config Connector to slowly ramp up traffic
between versions. Use Cloud Monitoring to look for performance issues
D. Deploy the application by using kubectl and set the spec. updatestrategy. type field to
RollingUpdate Use Cloud Monitoring to look for performance issues, and run the kubectl
rollback command if there are any issues.
Answer(s): A
Explanation:
The best option for deploying a new version of your containerized application to production on
GKE and ensuring that the application does not have performance problems after deployment is
to deploy the application through a continuous delivery pipeline by using canary deployments,
use Cloud Monitoring to look for performance issues, and ramp up traffic as supported by the
metrics. A canary deployment is a deployment strategy that involves releasing a new version of
an application to a subset of users or servers and monitoring its performance and reliability. This
way, you can test the new version in the production environment with real traffic and load, and
gradually increase the traffic as the metrics indicate. You can use Cloud Monitoring to collect
and analyze metrics from your application and GKE cluster, such as latency, error rate, CPU
utilization, and memory usage. You can also use Cloud Monitoring to set up alerts and
dashboards to track the performance of your application.
QUESTION: 101
You are managing an application that runs in Compute Engine The application uses a custom
HTTP server to expose an API that is accessed by other applications through an internal
TCP/UDP load balancer A firewall rule allows access to the API port from 0.0.0-0/0. You need to
configure Cloud Logging to log each IP address that accesses the API by using the fewest
number of steps What should you do Bret?
A. Enable Packet Mirroring on the VPC

B. Install the Ops Agent on the Compute Engine instances.
C. Enable logging on the firewall rule
D. Enable VPC Flow Logs on the subnet
Answer(s): C
Explanation:
The best option for configuring Cloud Logging to log each IP address that accesses the API by
using the fewest number of steps is to enable logging on the firewall rule. A firewall rule is a rule
that controls the traffic to and from your Compute Engine instances. You can enable logging on
a firewall rule to capture information about the traffic that matches the rule, such as source and
destination IP addresses, protocols, ports, and actions. You can use Cloud Logging to view and
export the firewall logs to other destinations, such as BigQuery, for further analysis.
QUESTION: 102
You support a user-facing web application When analyzing the application's error budget over
the previous six months you notice that the application never consumed more than 5% of its
error budget You hold a SLO review with business stakeholders and confirm that the SLO is set
appropriately You want your application's reliability to more closely reflect its SLO What steps
can you take to further that goal while balancing velocity, reliability, and business needs?
Choose 2 answers
A. Add more serving capacity to all of your application's zones

B. Implement and measure all other available SLIs for the application
C. Announce planned downtime to consume more error budget and ensure that users are not
depending on a tighter SLO
D. Have more frequent or potentially risky application releases
E. Tighten the SLO to match the application's observed reliability
Answer(s): D, E
Explanation:
The best options for furthering your application's reliability goal while balancing velocity,
reliability, and business needs are to have more frequent or potentially risky application releases
and to tighten the SLO to match the application's observed reliability. Having more frequent or
potentially risky application releases can help you increase the change velocity and deliver new
features faster. However, this also increases the likelihood of consuming more error budget and
reducing the reliability of your service. Therefore, you should monitor your error budget
consumption and adjust your release policies accordingly. For example, you can freeze or slow
down releases when the error budget is low, or accelerate releases when the error budget is
high. Tightening the SLO to match the application's observed reliability can help you align your
service quality with your users' expectations and business needs. However, this also means
that you have less room for error and need to maintain a higher level of reliability. Therefore,
you should ensure that your SLO is realistic and achievable, and that you have sufficient
engineering resources and processes to meet it.
QUESTION: 103
Your company runs an ecommerce website built with JVM-based applications and microservice
architecture in Google Kubernetes Engine (GKE) The application load increases during the day
and decreases during the night Your operations team has configured the application to run
enough Pods to handle the evening peak load You want to automate scaling by only running
enough Pods and nodes for the load What should you do?
A. Configure the Vertical Pod Autoscaler but keep the node pool size static
B. Configure the Vertical Pod Autoscaler and enable the cluster autoscaler
C. Configure the Horizontal Pod Autoscaler but keep the node pool size static
D. Configure the Horizontal Pod Autoscaler and enable the cluster autoscaler
Answer(s): D
Explanation:
The best option for automating scaling by only running enough Pods and nodes for the load is to
configure the Horizontal Pod Autoscaler and enable the cluster autoscaler. The Horizontal Pod
Autoscaler is a feature that automatically adjusts the number of Pods in a deployment or replica
set based on observed CPU utilization or custom metrics. The cluster autoscaler is a feature
that automatically adjusts the size of a node pool based on the demand for node capacity. By
using both features together, you can ensure that your application runs enough Pods to handle
the load, and that your cluster runs enough nodes to host the Pods. This way, you can optimize
your resource utilization and cost efficiency.
QUESTION: 104
Your organization wants to increase the availability target of an application from 99 9% to 99
99% for an investment of $2 000 The application's current revenue is S1,000,000 You need to
determine whether the increase in availability is worth the investment for a single year of usage
What should you do?
A. Calculate the value of improved availability to be $900, and determine that the increase in
availability is not worth the investment
B. Calculate the value of improved availability to be $1 000 and determine that the increase in
availability is not worth the investment
C. Calculate the value of improved availability to be $1 000 and determine that the increase in
availability is worth the investment
D. Calculate the value of improved availability to be $9,000. and determine that the increase in
availability is worth the investment
Answer(s): A
Explanation:
The best option for determining whether the increase in availability is worth the investment for a
single year of usage is to calculate the value of improved availability to be $900, and determine
that the increase in availability is not worth the investment. To calculate the value of improved
availability, we can use the following formula:
Value of improved availability = Revenue * (New availability - Current availability) Plugging in
the given numbers, we get:
Value of improved availability = $1,000,000 * (0.9999 - 0.999) = $900 Since the value of
improved availability is less than the investment of $2,000, we can conclude that the increase in
availability is not worth the investment.
QUESTION: 105
A third-party application needs to have a service account key to work properly When you try to
export the key from your cloud project you receive an error "The organization policy constraint
larn.disableServiceAccountKeyCreation is enforcedM You need to make the third-party
application work while following Google-recommended security practices What should you do?
A. Enable the default service account key. and download the key
B. Remove the iam.disableServiceAccountKeyCreation policy at the organization level, and
create a key.
C. Disable the service account key creation policy at the project's folder, and download the
default key
D. Add a rule to set the iam.disableServiceAccountKeyCreation policy to off in your project and
create a key.
Answer(s): D
Explanation:
The best option for making the third-party application work while following Google-
recommended security practices is to add a rule to set the
iam.disableServiceAccountKeyCreation policy to off in your project and create a key. The
iam.disableServiceAccountKeyCreation policy is an organization policy that controls whether
service account keys can be created in a project or organization. By default, this policy is set to
on, which means that service account keys cannot be created. However, you can override this
policy at a lower level, such as a project, by adding a rule to set it to off. This way, you can
create a service account key for your project without affecting other projects or organizations.
You should also follow the best practices for managing service account keys, such as rotating
them regularly, storing them securely, and deleting them when they are no longer needed.
QUESTION: 106
Your team is writing a postmortem after an incident on your external facing application Your
team wants to improve the postmortem policy to include triggers that indicate whether an
incident requires a postmortem Based on Site Reliability Engineenng (SRE) practices, what
triggers should be defined in the postmortem policy?
Choose 2 answers
A. An external stakeholder asks for a postmortem

B. Data is lost due to an incident
C. An internal stakeholder requests a postmortem
D. The monitoring system detects that one of the instances for your application has failed
E. The CD pipeline detects an issue and rolls back a problematic release.
Answer(s): A, C
Explanation:
The best options for defining triggers that indicate whether an incident requires a postmortem
based on Site Reliability Engineering (SRE) practices are an external stakeholder asks for a
postmortem and data is lost due to an incident. An external stakeholder is someone who is
affected by or has an interest in the service, such as a customer or a partner. If an external
stakeholder asks for a postmortem, it means that they are concerned about the impact or root
cause of the incident, and they expect an explanation and remediation from the service
provider. Therefore, this should trigger a postmortem to address their concerns and improve
their satisfaction. Data loss is a serious consequence of an incident that can affect the integrity
and reliability of the service. If data is lost due to an incident, it means that there was a failure in
the backup or recovery mechanisms, or that there was a corruption or deletion of data.
Therefore, this should trigger a postmortem to investigate the cause and impact of the data loss,
and to prevent it from happening again.
QUESTION: 107
You are implementing a CI'CD pipeline for your application in your company s multi-cloud
environment Your application is deployed by using custom Compute Engine images and the
equivalent in other cloud providers You need to implement a solution that will enable you to
build and deploy the images to your current environment and is adaptable to future changes
Which solution stack should you use'?
A. Cloud Build with Packer

B. Cloud Build with Google Cloud Deploy
C. Google Kubernetes Engine with Google Cloud Deploy
D. Cloud Build with kpt
Answer(s): B
Explanation:
Cloud Build is a fully managed continuous integration and continuous delivery (CI/CD) service
that helps you automate your builds, tests, and deployments. Google Cloud Deploy is a service
that automates the deployment of your applications to Google Kubernetes Engine (GKE).
Together, Cloud Build and Google Cloud Deploy can be used to build and deploy your
application's custom Compute Engine images to your current environment and to other cloud
providers in the future.
Here are the steps involved in using Cloud Build and Google Cloud Deploy to implement a
CI/CD pipeline for your application:
Create a Cloud Build trigger that fires whenever a change is made to your application's code. In
the Cloud Build trigger, configure Cloud Build to build your application's Docker image. Create a
Google Cloud Deploy configuration file that specifies how to deploy your application's Docker
image to GKE.
In Google Cloud Deploy, create a deployment that uses your configuration file. Once you have
created the Cloud Build trigger and Google Cloud Deploy configuration file, any changes made
to your application's code will trigger Cloud Build to build a new Docker image. Google Cloud
Deploy will then deploy the new Docker image to GKE. This solution stack is adaptable to future
changes because it uses a cloud-agnostic approach. Cloud Build can be used to build Docker
images for any cloud provider, and Google Cloud Deploy can be used to deploy Docker images
to any Kubernetes cluster. The other solution stacks are not as adaptable to future changes. For
example, solution stack A (Cloud Build with Packer) is limited to building Docker images for
Compute Engine. Solution stack C (Google Kubernetes Engine with Google Cloud Deploy) is
limited to deploying Docker images to GKE. Solution stack D (Cloud Build with kpt) is a newer
solution that is not yet as mature as Cloud Build and Google Cloud Deploy.
Overall, the best solution stack for implementing a CI/CD pipeline for your application in a multi-
cloud environment is Cloud Build with Google Cloud Deploy. This solution stack is fully
managed, cloud-agnostic, and adaptable to future changes.
QUESTION: 108
Your applications performance in Google Cloud has degraded since the last release You
suspect that downstream dependencies might be causing some requests to take longer to
complete You need to investigate the issue with your application to determine the cause What
should you do?
A. Configure Error Reporting in your application

B. Configure Google Cloud Managed Service for Prometheus in your application
C. Configure Cloud Profiler in your application
D. Configure Cloud Trace in your application
Answer(s): D
Explanation:
The best option for investigating the issue with your application's performance in Google Cloud
is to configure Cloud Trace in your application. Cloud Trace is a service that allows you to
collect and analyze latency data from your application. You can use Cloud Trace to trace
requests across different components of your application, such as downstream dependencies,
and identify where they take longer to complete. You can also use Cloud Trace to compare
latency data across different versions of your application, and detect any performance
degradation or improvement. By using Cloud Trace, you can diagnose and troubleshoot
performance issues with your application in Google Cloud.
QUESTION: 109
You are creating a CI/CD pipeline in Cloud Build to build an application container image The
application code is stored in GitHub Your company requires thai production image builds are
only run against the main branch and that the change control team approves all pushes to the
main branch You want the image build to be as automated as possible What should you do?
Choose 2 answers
A. Create a trigger on the Cloud Build job Set the repository event setting to Pull request'
B. Add the owners file to the Included files filter on the trigger
C. Create a trigger on the Cloud Build job Set the repository event setting to Push to a branch
D. Configure a branch protection rule for the main branch on the repository
E. Enable the Approval option on the trigger
Answer(s): C, D
Explanation:
The best options for creating a CI/CD pipeline in Cloud Build to build an application container
image and ensuring that production image builds are only run against the main branch and that
the change control team approves all pushes to the main branch are to create a trigger on the
Cloud Build job, set the repository event setting to Push to a branch, and configure a branch
protection rule for the main branch on the repository. A trigger is a resource that starts a build
when an event occurs, such as a code change. By creating a trigger on the Cloud Build job and
setting the repository event setting to Push to a branch, you can ensure that the image build is
only run when code is pushed to a specific branch, such as the main branch. A branch
protection rule is a rule that enforces certain policies on a branch, such as requiring reviews,
status checks, or approvals before merging code. By configuring a branch protection rule for the
main branch on the repository, you can ensure that the change control team approves all
pushes to the main branch.
QUESTION: 110
You built a serverless application by using Cloud Run and deployed the application to your
production environment You want to identify the resource utilization of the application for cost
optimization What should you do?
A. Use Cloud Trace with distributed tracing to monitor the resource utilization of the application
B. Use Cloud Profiler with Ops Agent to monitor the CPU and memory utilization of the
application
C. Use Cloud Monitoring to monitor the container CPU and memory utilization of the application
D. Use Cloud Ops to create logs-based metrics to monitor the resource utilization of the
application
Answer(s): A
Explanation:
The best option for giving developers the ability to test the latest revisions of the service before
the service is exposed to customers is to run the gcloud run deploy booking-engine --no-traffic --
tag dev command and use the https://dev----booking-engine-abcdef.a.run.app URL for testing.
The gcloud run deploy command is a command that deploys a new revision of your service or
updates an existing service. By using the --no-traffic flag, you can prevent any traffic from being
sent to the new revision. By using the --tag flag, you can assign a tag to the new revision, such
as dev. This way, you can create a new revision of your service without affecting your
customers. You can also use the tag- based URL (e.g., https://dev----booking-engine-
abcdef.a.run.app) to access and test the new revision.
QUESTION: 111
Your company is using HTTPS requests to trigger a public Cloud Run-hosted service accessible
at the https://booking-engine-abcdef .a.run.app URL You need to give developers the ability to
test the latest revisions of the service before the service is exposed to customers What should
you do?
A. Runthegcioud run deploy booking-engine --no-traffic ---ag dev command Use the https://dev--
-- booking-engine-abcdef. a. run. app URL for testing
B. Runthegcioud run services update-traffic booking-engine --to-revisions LATEST*! command
Use the ht tps: //booking-engine-abcdef. a. run. ape URL for testing
C. Pass the curl -K "Authorization: Hearer S(gclcud auth print-identity-token)" auth token Use
the https: / /booking-engine-abcdef. a. run. app URL to test privately
D. Grant the roles/run. invoker role to the developers testing the booking-engine service Use the
https: //booking-engine-abcdef. private. run. app URL for testing
Answer(s): B
Explanation:
The best option for securing the CI/CD deployment pipeline is to configure vulnerability analysis
with Artifact Registry and Binary Authorization. Vulnerability analysis is a feature that allows you
to scan container images for known vulnerabilities and security issues. You can use vulnerability
analysis with Artifact Registry, which is a service that allows you to store and manage container
images and other artifacts. By using vulnerability analysis with Artifact Registry, you can ensure
that your container images are scanned for vulnerabilities before they are deployed. Binary
Authorization is a feature that allows you to enforce signature-based validation when deploying
container images. You can use Binary Authorization with Cloud Build, which is a service that
allows you to build and deploy container images. By using Binary Authorization with Cloud
Build, you can ensure that only authorized and verified container images are deployed to your
environment.
QUESTION: 112
You are configuring connectivity across Google Kubernetes Engine (GKE) clusters in different
VPCs You notice that the nodes in Cluster A are unable to access the nodes in Cluster B You
suspect that the workload access issue is due to the network configuration You need to
troubleshoot the issue but do not have execute access to workloads and nodes You want to
identify the layer at which the network connectivity is broken What should you do?
A. Install a toolbox container on the node in Cluster A Confirm that the routes to Cluster B are
configured appropriately
B. Use Network Connectivity Center to perform a Connectivity Test from Cluster A to Cluster
C. Use a debug container to run the traceroute command from Cluster A to Cluster B and from
Cluster B to Cluster A Identify the common failure point
D. Enable VPC Flow Logs in both VPCs and monitor packet drops
Answer(s): B
Explanation:
The best option for troubleshooting the issue without having execute access to workloads and
nodes is to use Network Connectivity Center to perform a Connectivity Test from Cluster A to
Cluster B. Network Connectivity Center is a service that allows you to create, manage, and
monitor network connectivity across Google Cloud, hybrid, and multi-cloud environments. You
can use Network Connectivity Center to perform a Connectivity Test, which is a feature that
allows you to test the reachability and latency between two endpoints, such as GKE clusters,
VM instances, or IP addresses. By using Network Connectivity Center to perform a Connectivity
Test from Cluster A to Cluster B, you can identify the layer at which the network connectivity is
broken, such as the firewall, routing, or load balancing.
QUESTION: 113
You manage an application that runs in Google Kubernetes Engine (GKE) and uses the
blue/green deployment methodology Extracts of the Kubernetes manifests are shown below
The Deployment app-green was updated to use the new version of the application During post-
deployment monitoring you notice that the majority of user requests are failing You did not
observe this behavior in the testing environment You need to mitigate the incident impact on
users and enable the developers to troubleshoot the issue What should you do?
A. Update the Deployment app-blue to use the new version of the application
B. Update the Deployment ape-green to use the previous version of the application
C. Change the selector on the Service app-2vc to app: my-app.
D. Change the selector on the Service app-svc to app: my-app, version: blue
Answer(s): D
Explanation:
The best option for mitigating the incident impact on users and enabling the developers to
troubleshoot the issue is to change the selector on the Service app-svc to app: my-app, version:
blue. A Service is a resource that defines how to access a set of Pods. A selector is a field that
specifies which Pods are selected by the Service. By changing the selector on the Service app-
svc to app: my- app, version: blue, you can ensure that the Service only routes traffic to the
Pods that have both labels app: my-app and version: blue. These Pods belong to the
Deployment app-blue, which uses the previous version of the application. This way, you can
mitigate the incident impact on users by switching back to the working version of the application.
You can also enable the developers to troubleshoot the issue with the new version of the
application in the Deployment app-green without affecting users.
QUESTION: 114
You are running a web application deployed to a Compute Engine managed instance group Ops
Agent is installed on all instances You recently noticed suspicious activity from a specific IP
address You need to configure Cloud Monitoring to view the number of requests from that
specific IP address with minimal operational overhead.
What should you do?
A. Configure the Ops Agent with a logging receiver Create a logs-based metric
B. Create a script to scrape the web server log Export the IP address request metrics to the
Cloud Monitoring API
C. Update the application to export the IP address request metrics to the Cloud Monitoring API
D. Configure the Ops Agent with a metrics receiver
Answer(s): A
Explanation:
The best option for configuring Cloud Monitoring to view the number of requests from a specific
IP address with minimal operational overhead is to configure the Ops Agent with a logging
receiver and create a logs-based metric. The Ops Agent is an agent that collects system metrics
and logs from your VM instances and sends them to Cloud Monitoring and Cloud Logging. A
logging receiver is a configuration that specifies which logs are collected by the Ops Agent and
how they are processed. You can use a logging receiver to collect web server logs from your
VM instances and send them to Cloud Logging. A logs-based metric is a metric that is extracted
from log entries in Cloud Logging. You can use a logs-based metric to count the number of
requests from a specific IP address by using a filter expression. You can then use Cloud
Monitoring to view and analyze the logs-based metric.
QUESTION: 115
Your organization is using Helm to package containerized applications Your applications
reference both public and private charts Your security team flagged that using a public Helm
repository as a dependency is a risk You want to manage all charts uniformly, with native
access control and VPC Service Controls What should you do?
A. Store public and private charts in OCI format by using Artifact Registry
B. Store public and private charts by using GitHub Enterprise with Google Workspace as the
identity provider
C. Store public and private charts by using Git repository Configure Cloud Build to synchronize
contents of the repository into a Cloud Storage bucket Connect Helm to the bucket by using
https: // [bucket] .srorage.googleapis.com/ [holnchart] as the Helm repository
D. Configure a Helm chart repository server to run in Google Kubernetes Engine (GKE) with
Cloud Storage bucket as the storage backend
Answer(s): A
Explanation:
The best option for managing all charts uniformly, with native access control and VPC Service
Controls is to store public and private charts in OCI format by using Artifact Registry. Artifact
Registry is a service that allows you to store and manage container images and other artifacts in
Google Cloud. Artifact Registry supports OCI format, which is an open standard for storing
container images and other artifacts such as Helm charts. You can use Artifact Registry to store
public and private charts in OCI format and manage them uniformly. You can also use Artifact
Registry's native access control features, such as IAM policies and VPC Service Controls, to
secure your charts and control who can access them.
QUESTION: 116
You use Terraform to manage an application deployed to a Google Cloud environment The
application runs on instances deployed by a managed instance group The Terraform code is
deployed by using a CI/CD pipeline When you change the machine type on the instance
template used by the managed instance group, the pipeline fails at the terraform apply stage
with the following error message
You need to update the instance template and minimize disruption to the application and the
number of pipeline runs What should you do?
A. Delete the managed instance group and recreate it after updating the instance template
B. Add a new instance template update the managed instance group to use the new instance
template and delete the old instance template
C. Remove the managed instance group from the Terraform state file update the instance
template and reimport the managed instance group.
D. Set the create_bef ore_destroy meta-argument to true in the lifecycle block on the instance
template
Answer(s): D
Explanation:
The best option for updating the instance template and minimizing disruption to the application
and the number of pipeline runs is to set the create_before_destroy meta-argument to true in
the lifecycle block on the instance template. The create_before_destroy meta-argument is a
Terraform feature that specifies that a new resource should be created before destroying an
existing one during an update. This way, you can avoid downtime and errors when updating a
resource that is in use by another resource, such as an instance template that is used by a
managed instance group. By setting the create_before_destroy meta-argument to true in the
lifecycle block on the instance template, you can ensure that Terraform creates a new instance
template with the updated machine type, updates the managed instance group to use the new
instance template, and then deletes the old instance template.
QUESTION: 117
You are performing a semi-annual capacity planning exercise for your flagship service You
expect a service user growth rate of 10% month-over-month for the next six months Your
service is fully containerized and runs on a Google Kubemetes Engine (GKE) standard cluster
across three zones with cluster autoscaling enabled You currently consume about 30% of your
total deployed CPU capacity and you require resilience against the failure of a zone. You want
to ensure that your users experience minimal negative impact as a result of this growth o' as a
result of zone failure while you avoid unnecessary costs How should you prepare to handle the
predicted growth?
A. Verify the maximum node pool size enable a Horizontal Pod Autoscaler and then perform a
load lest to verify your expected resource needs
B. Because you deployed the service on GKE and are using a cluster autoscaler your GKE
cluster will scale automatically regardless of growth rate
C. Because you are only using 30% of deployed CPU capacity there is significant headroom
and you do not need to add any additional capacity for this rate of growth
D. Proactively add 80% more node capacity to account for six months of 10% growth rate and
then perform a load test to ensure that you have enough capacity
Answer(s): A
Explanation:
The best option for preparing to handle the predicted growth is to verify the maximum node pool
size, enable a Horizontal Pod Autoscaler, and then perform a load test to verify your expected
resource needs. The maximum node pool size is a parameter that specifies the maximum
number of nodes that can be added to a node pool by the cluster autoscaler. You should verify
that the maximum node pool size is sufficient to accommodate your expected growth rate and
avoid hitting any quota limits. The Horizontal Pod Autoscaler is a feature that automatically
adjusts the number of Pods in a deployment or replica set based on observed CPU utilization or
custom metrics. You should enable a Horizontal Pod Autoscaler for your application to ensure
that it runs enough Pods to handle the load. A load test is a test that simulates high user traffic
and measures the performance and reliability of your application. You should perform a load
test to verify your expected resource needs and identify any bottlenecks or issues.
QUESTION: 118
Your company operates in a highly regulated domain that requires you to store all organization
logs for seven years You want to minimize logging infrastructure complexity by using managed
services You need to avoid any future loss of log capture or stored logs due to misconfiguration
or human error What should you do?
A. Use Cloud Logging to configure an aggregated sink at the organization level to export all logs
into a BigQuery dataset
B. Use Cloud Logging to configure an aggregated sink at the organization level to export all logs
into Cloud Storage with a seven-year retention policy and Bucket Lock
C. Use Cloud Logging to configure an export sink at each project level to export all logs into a
BigQuery dataset
D. Use Cloud Logging to configure an export sink at each project level to export all logs into
Cloud Storage with a seven-year retention policy and Bucket Lock
Answer(s): B
Explanation:
The best option for storing all organization logs for seven years and avoiding any future loss of
log capture or stored logs due to misconfiguration or human error is to use Cloud Logging to
configure an aggregated sink at the organization level to export all logs into Cloud Storage with
a seven-year retention policy and Bucket Lock. Cloud Logging is a service that allows you to
collect and manage logs from your Google Cloud resources and applications. An aggregated
sink is a sink that collects logs from multiple sources, such as projects, folders, or organizations.
You can use Cloud Logging to configure an aggregated sink at the organization level to export
all logs into Cloud Storage, which is a service that allows you to store and access data in
Google Cloud. A retention policy is a policy that specifies how long objects in a bucket are
retained before they are deleted. Bucket Lock is a feature that allows you to lock a retention
policy on a bucket and prevent it from being reduced or removed.
You can use Cloud Storage with a seven-year retention policy and Bucket Lock to ensure that
your logs are stored for seven years and protected from accidental or malicious deletion.
QUESTION: 119
You are building the Cl/CD pipeline for an application deployed to Google Kubernetes Engine
(GKE) The application is deployed by using a Kubernetes Deployment, Service, and Ingress
The application team asked you to deploy the application by using the blue'green deployment
methodology You need to implement the rollback actions What should you do?
A. Run the kubectl rollout undo command

B. Delete the new container image, and delete the running Pods
C. Update the Kubernetes Service to point to the previous Kubernetes Deployment
D. Scale the new Kubernetes Deployment to zero
Answer(s): C
Explanation:
The best option for implementing the rollback actions is to update the Kubernetes Service to
point to the previous Kubernetes Deployment. A Kubernetes Service is a resource that defines
how to access a set of Pods. A Kubernetes Deployment is a resource that manages the creation
and update of Pods. By using the blue/green deployment methodology, you can create two
Deployments, one for the current version (blue) and one for the new version (green), and use a
Service to switch traffic between them. If you need to rollback, you can update the Service to
point to the previous Deployment (blue) and stop sending traffic to the new Deployment (green).
QUESTION: 120
You are building and running client applications in Cloud Run and Cloud Functions Your client
requires that all logs must be available for one year so that the client can import the logs into
their logging service You must minimize required code changes What should you do?
A. Update all images in Cloud Run and all functions in Cloud Functions to send logs to both
Cloud Logging and the client's logging service Ensure that all the ports required to send logs are
open in the VPC firewall
B. Create a Pub/Sub topic subscription and logging sink Configure the logging sink to send all
logs into the topic Give your client access to the topic to retrieve the logs
C. Create a storage bucket and appropriate VPC firewall rules Update all images in Cloud Run
and all functions in Cloud Functions to send logs to a file within the storage bucket
D. Create a logs bucket and logging sink. Set the retention on the logs bucket to 365 days
Configure the logging sink to send logs to the bucket Give your client access to the bucket to
retrieve the logs
Answer(s): D
Explanation:
The best option for storing all logs for one year and minimizing required code changes is to
create a logs bucket and logging sink, set the retention on the logs bucket to 365 days,
configure the logging sink to send logs to the bucket, and give your client access to the bucket
to retrieve the logs. A logs bucket is a Cloud Storage bucket that is used to store logs from
Cloud Logging. A logging sink is a resource that defines where log entries are sent, such as a
logs bucket, BigQuery dataset, or Pub/Sub topic. You can create a logs bucket and logging sink
in Cloud Logging and set the retention on the logs bucket to 365 days. This way, you can
ensure that all logs are stored for one year and protected from deletion. You can also configure
the logging sink to send logs from Cloud Run and Cloud Functions to the logs bucket without
any code changes. You can then give your client access to the logs bucket by using IAM
policies or signed URLs.
QUESTION: 121
As part of your company's initiative to shift left on security, the infoSec team is asking all teams
to implement guard rails on all the Google Kubernetes Engine (GKE) clusters to only allow the
deployment of trusted and approved images You need to determine how to satisfy the InfoSec
teams goal of shifting left on security.
What should you do?
A. Deploy Falco or Twistlock on GKE to monitor for vulnerabilities on your running Pods
B. Configure Identity and Access Management (1AM) policies to create a least privilege model
on your GKE clusters
C. Use Binary Authorization to attest images during your CI CD pipeline
D. Enable Container Analysis in Artifact Registry, and check for common vulnerabilities and
exposures (CVEs) in your container images
Answer(s): C
Explanation:
The best option for implementing guard rails on all GKE clusters to only allow the deployment of
trusted and approved images is to use Binary Authorization to attest images during your CI/CD
pipeline. Binary Authorization is a feature that allows you to enforce signature-based validation
when deploying container images. You can use Binary Authorization to create policies that
specify which images are allowed or denied in your GKE clusters. You can also use Binary
Authorization to attest images during your CI/CD pipeline by using tools such as Container
Analysis or third-party integrations. An attestation is a digital signature that certifies that an
image meets certain criteria, such as passing vulnerability scans or code reviews. By using
Binary Authorization to attest images during your CI/CD pipeline, you can ensure that only
trusted and approved images are deployed to your GKE clusters.
QUESTION: 122
You have an application that runs in Google Kubernetes Engine (GKE). The application consists
of several microservices that are deployed to GKE by using Deployments and Services One of
the microservices is experiencing an issue where a Pod returns 403 errors after the Pod has
been running for more than five hours Your development team is working on a solution but the
issue will not be resolved for a month You need to ensure continued operations until the
microservice is fixed You want to follow Google-recommended practices and use the fewest
number of steps What should you do?
A. Create a cron job to terminate any Pods that have been running for more than five hours
B. Add a HTTP liveness probe to the microservice s deployment
C. Monitor the Pods and terminate any Pods that have been running for more than five hours
D. Configure an alert to notify you whenever a Pod returns 403 errors
Answer(s): B
Explanation:
The best option for ensuring continued operations until the microservice is fixed is to add a
HTTP
liveness probe to the microservice's deployment. A HTTP liveness probe is a type of probe that
checks if a Pod is alive by sending an HTTP request and expecting a success response code. If
the probe fails, Kubernetes will restart the Pod. You can add a HTTP liveness probe to your
microservice's deployment by using a livenessProbe field in your Pod spec. This way, you can
ensure that any Pod that returns 403 errors after running for more than five hours will be
restarted automatically and resume normal operations.
QUESTION: 123
You want to share a Cloud Monitoring custom dashboard with a partner team What should you
do?
A. Provide the partner team with the dashboard URL to enable the partner team to create a
copy of the dashboard
B. Export the metrics to BigQuery Use Looker Studio to create a dashboard, and share the
dashboard with the partner team
C. Copy the Monitoring Query Language (MQL) query from the dashboard; and send the MQL
query to the partner team
D. Download the JSON definition of the dashboard, and send the JSON file to the partner team
Answer(s): A
Explanation:
The best option for sharing a Cloud Monitoring custom dashboard with a partner team is to
provide the partner team with the dashboard URL to enable the partner team to create a copy of
the dashboard. A Cloud Monitoring custom dashboard is a dashboard that allows you to create
and customize charts and widgets to display metrics, logs, and traces from your Google Cloud
resources and applications. You can share a custom dashboard with a partner team by
providing them with the dashboard URL, which is a link that allows them to view the dashboard
in their browser. The partner team can then create a copy of the dashboard in their own project
by using the Copy Dashboard option. This way, they can access and modify the dashboard
without affecting the original one.
QUESTION: 124
You are building an application that runs on Cloud Run The application needs to access a third-
party API by using an API key You need to determine a secure way to store and use the API
key in your application by following Google-recommended practices What should you do?
A. Save the API key in Secret Manager as a secret Reference the secret as an environment
variable in the Cloud Run application
B. Save the API key in Secret Manager as a secret key Mount the secret key under the
/sys/api_key directory and decrypt the key in the Cloud Run application
C. Save the API key in Cloud Key Management Service (Cloud KMS) as a key Reference the
key as an environment variable in the Cloud Run application
D. Encrypt the API key by using Cloud Key Management Service (Cloud KMS) and pass the key
to Cloud Run as an environment variable Decrypt and use the key in Cloud Run
Answer(s): A
Explanation:
The best option for storing and using the API key in your application by following Google-
recommended practices is to save the API key in Secret Manager as a secret and reference the
secret as an environment variable in the Cloud Run application. Secret Manager is a service
that allows you to store and manage sensitive data, such as API keys, passwords, and
certificates, in Google Cloud. A secret is a resource that represents a logical secret, such as an
API key. You can save the API key in Secret Manager as a secret and use IAM policies to
control who can access it. You can also reference the secret as an environment variable in the
Cloud Run application by using the ${SECRET_NAME} syntax. This way, you can securely
store and use the API key in your application without exposing it in your code or configuration
files.
QUESTION: 125
You are currently planning how to display Cloud Monitoring metrics for your organization's
Google Cloud projects. Your organization has three folders and six projects:
You want to configure Cloud Monitoring dashboards lo only display metrics from the projects
within one folder You need to ensure that the dashboards do not display metrics from projects in
the other folders You want to follow Google-recommended practices What should you do?
A. Create a single new scoping project

B. Create new scoping projects for each folder
C. Use the current app-one-prod project as the scoping project
D. Use the current app-one-dev, app-one-staging and app-one-prod projects as the scoping
project for each folder
Answer(s): B
Explanation:
The best option for configuring Cloud Monitoring dashboards to only display metrics from the
projects within one folder is to create new scoping projects for each folder. A scoping project is
a project that defines which resources are monitored by Cloud Monitoring. You can create new
scoping projects for each folder by using the gcloud monitoring register-project command. This
way, you can associate each scoping project with a folder and only monitor the resources within
that folder. You can then configure Cloud Monitoring dashboards to use the scoping projects as
data sources and only display metrics from the projects within one folder.
QUESTION: 126
Your company's security team needs to have read-only access to Data Access audit logs in the
_Required bucket You want to provide your security team with the necessary permissions
following the principle of least privilege and Google-recommended practices.
What should you do?
A. Assign the roles/logging, viewer role to each member of the security team
B. Assign the roles/logging. viewer role to a group with all the security team members
C. Assign the roles/logging.privateLogViewer role to each member of the security team
D. Assign the roles/logging.privateLogviewer role to a group with all the security team members
Answer(s): D
Explanation:
The best option for providing your security team with the necessary permissions following the
principle of least privilege and Google-recommended practices is to assign the
roles/logging.privateLogViewer role to a group with all the security team members. The
roles/logging.privateLogViewer role is a predefined role that grants read-only access to Data
Access audit logs and other private logs in Cloud Logging. A group is a collection of users that
can be assigned roles and permissions as a single unit. You can assign the
roles/logging.privateLogViewer role to a group with all the security team members by using IAM
policies. This way, you can provide your security team with the minimum level of access they
need to view Data Access audit logs in the _Required bucket.
QUESTION: 127
Your team is building a service that performs compute-heavy processing on batches of data The
data is processed faster based on the speed and number of CPUs on the machine These
batches of data vary in size and may arrive at any time from multiple third-party sources You
need to ensure that third parties are able to upload their data securely. You want to minimize
costs while ensuring that the data is processed as quickly as possible What should you do?
A. · Provide a secure file transfer protocol (SFTP) server on a Compute Engine instance so that
third parties can upload batches of data and provide appropriate credentials to the server ·
Create a Cloud Function with a google.storage, object, finalize Cloud Storage trigger Write code
so that the function can scale up a Compute Engine autoscaling managed instance group · Use
an image pre-loaded with the data processing software that terminates the instances when
processing completes
B. · Provide a Cloud Storage bucket so that third parties can upload batches of data, and
provide appropriate Identity and Access Management (1AM) access to the bucket · Use a
standard Google Kubernetes Engine (GKE) cluster and maintain two services one that
processes the batches of data and one that monitors Cloud Storage for new batches of data ·
Stop the processing service when there are no batches of data to process
C. · Provide a Cloud Storage bucket so that third parties can upload batches of data, and
provide appropriate identity and Access Management (1AM) access to the bucket · Create a
Cloud Function with a google, storage, object .finalise Cloud Storage trigger Write code so that
the function can scale up a Compute Engine autoscaling managed instance group · Use an
image pre-loaded with the data processing software that terminates the instances when
processing completes
D. · Provide a Cloud Storage bucket so that third parties can upload batches of data, and
provide appropriate Identity and Access Management (1AM) access to the bucket · Use Cloud
Monitoring to detect new batches of data in the bucket and trigger a Cloud Function that
processes the data
· Set a Cloud Function to use the largest CPU possible to minimize the runtime of the
processing
Answer(s): C
Explanation:
The best option for ensuring that third parties are able to upload their data securely and
minimizing costs while ensuring that the data is processed as quickly as possible is to provide a
Cloud Storage bucket so that third parties can upload batches of data, and provide appropriate
Identity and Access Management (IAM) access to the bucket; create a Cloud Function with a
google.storage.object.finalize Cloud Storage trigger; write code so that the function can scale up
a Compute Engine autoscaling managed instance group; use an image pre-loaded with the data
processing software that terminates the instances when processing completes. A Cloud Storage
bucket is a resource that allows you to store and access data in Google Cloud. You can provide
a Cloud Storage bucket so that third parties can upload batches of data securely and
conveniently. You can also provide appropriate IAM access to the bucket by using roles and
policies to control who can read or write data to the bucket. A Cloud Function is a serverless
function that executes code in response to an event, such as a change in a Cloud Storage
bucket. A google.storage.object.finalize trigger is a type of trigger that fires when a new object is
created or an existing object is overwritten in a Cloud Storage bucket. You can create a Cloud
Function with a google.storage.object.finalize trigger so that the function runs whenever a new
batch of data is uploaded to the bucket. You can write code so that the function can scale up a
Compute Engine autoscaling managed instance group, which is a group of VM instances that
automatically adjusts its size based on load or custom metrics. You can use an image pre-
loaded with the data processing software that terminates the instances when processing
completes, which means that the instances only run when there is data to process and stop
when they are done. This way, you can minimize costs while ensuring that the data is
processed as quickly as possible.
QUESTION: 128
You are reviewing your deployment pipeline in Google Cloud Deploy You must reduce toil in the
pipeline and you want to minimize the amount of time it takes to complete an end-to-end
deployment What should you do?
Choose 2 answers
A. Create a trigger to notify the required team to complete the next step when manual
intervention is required
B. Divide the automation steps into smaller tasks
C. Use a script to automate the creation of the deployment pipeline in Google Cloud Deploy
D. Add more engineers to finish the manual steps.
E. Automate promotion approvals from the development environment to the test environment
Answer(s): A, E
Explanation:
The best options for reducing toil in the pipeline and minimizing the amount of time it takes to
complete an end-to-end deployment are to create a trigger to notify the required team to
complete the next step when manual intervention is required and to automate promotion
approvals from the development environment to the test environment. A trigger is a resource
that initiates a deployment when an event occurs, such as a code change, a schedule, or a
manual request. You can create a trigger to notify the required team to complete the next step
when manual intervention is required by using Cloud Build or Cloud Functions. This way, you
can reduce the waiting time and human errors in the pipeline. A promotion approval is a process
that allows you to approve or reject a deployment from one environment to another, such as
from development to test. You can automate promotion approvals from the development
environment to the test environment by using Google Cloud Deploy or Cloud Build. This way,
you can speed up the deployment process and avoid manual steps.
QUESTION: 129
You work for a global organization and are running a monolithic application on Compute Engine
You need to select the machine type for the application to use that optimizes CPU utilization by
using the fewest number of steps You want to use historical system metncs to identify the
machine type for the application to use You want to follow Google-recommended practices
What should you do?
A. Use the Recommender API and apply the suggested recommendations

B. Create an Agent Policy to automatically install Ops Agent in all VMs
C. Install the Ops Agent in a fleet of VMs by using the gcloud CLI
D. Review the Cloud Monitoring dashboard for the VM and choose the machine type with the
lowest CPU utilization
Answer(s): A
Explanation:
The best option for selecting the machine type for the application to use that optimizes CPU
utilization by using the fewest number of steps is to use the Recommender API and apply the
suggested recommendations. The Recommender API is a service that provides
recommendations for optimizing your Google Cloud resources, such as Compute Engine
instances, disks, and firewalls. You can use the Recommender API to get recommendations for
changing the machine type of your Compute Engine instances based on historical system
metrics, such as CPU utilization. You can also apply the suggested recommendations by using
the Recommender API or Cloud Console. This way, you can optimize CPU utilization by using
the most suitable machine type for your application with minimal effort.
QUESTION: 130
You are configuring Cloud Logging for a new application that runs on a Compute Engine
instance with a public IP address. A user-managed service account is attached to the instance.
You confirmed that the necessary agents are running on the instance but you cannot see any
log entries from the instance in Cloud Logging. You want to resolve the issue by following
Google-recommended practices.
What should you do?
A. Add the Logs Writer role to the service account.

B. Enable Private Google Access on the subnet that the instance is in.
C. Update the instance to use the default Compute Engine service account.
D. Export the service account key and configure the agents to use the key.
Answer(s): A
Explanation:
The correct answer is
A) Add the Logs Writer role to the service account.
To use Cloud Logging, the service account attached to the Compute Engine instance must have
the necessary permissions to write log entries. The Logs Writer role (roles/logging.logWriter)
provides this permission. You can grant this role to the user-managed service account at the
project, folder, or organization level.
Private Google Access is not required for Cloud Logging, as it allows instances without external
IP addresses to access Google APIs and services. The default Compute Engine service
account already has the Logs Writer role, but it is not a recommended practice to use it for user
applications. Exporting the service account key and configuring the agents to use the key is not
a secure way of authenticating the service account, as it exposes the key to potential
compromise.
Reference:
1: Access control with IAM | Cloud Logging | Google Cloud
2: Private Google Access overview | VPC | Google Cloud
3: Service accounts | Compute Engine Documentation | Google Cloud

4: Best practices for securing service accounts | IAM Documentation | Google Cloud
QUESTION: 131
Your organization is starting to containerize with Google Cloud. You need a fully managed
storage solution for container images and Helm charts. You need to identify a storage solution
that has native integration into existing Google Cloud services, including Google Kubernetes
Engine (GKE), Cloud Run, VPC Service Controls, and Identity and Access Management (IAM).
What should you do?
A. Use Docker to configure a Cloud Storage driver pointed at the bucket owned by your
organization.
B. Configure Container Registry as an OCI-based container registry for container images.
C. Configure Artifact Registry as an OCI-based container registry for both Helm charts and
container images.
D. Configure an open source container registry server to run in GKE with a restrictive role-based
access control (RBAC) configuration.
Answer(s): C
QUESTION: 132
You are developing reusable infrastructure as code modules. Each module contains integration
tests that launch the module in a test project. You are using GitHub for source control. You need
to Continuously test your feature branch and ensure that all code is tested before changes are
accepted. You need to implement a solution to automate the integration tests.
What should you do?
A. Use a Jenkins server for Cl/CD pipelines. Periodically run all tests in the feature branch.
B. Use Cloud Build to run the tests. Trigger all tests to run after a pull request is merged.
C. Ask the pull request reviewers to run the integration tests before approving the code.
D. Use Cloud Build to run tests in a specific folder. Trigger Cloud Build for every GitHub pull
request.
Answer(s): D
Explanation:
Cloud Build is a service that executes your builds on Google Cloud Platform infrastructure.
Cloud Build can import source code from Google Cloud Storage, Cloud Source Repositories,
GitHub, or Bitbucket, execute a build to your specifications, and produce artifacts such as
Docker containers or
Java archives. Cloud Build can also run integration tests as part of your build steps. You can
use Cloud Build to run tests in a specific folder by specifying the path to the folder in the dir field
of your build step. For example, if you have a folder named tests that contains your integration
tests, you can use the following build step to run them:
steps:
- name: 'gcr.io/cloud-builders/go'
args: ['test', '-v']
dir: 'tests'
Copy
You can use Cloud Build to trigger builds for every GitHub pull request by using the Cloud Build
GitHub app. The app allows you to automatically build on Git pushes and pull requests and view
your build results on GitHub and Google Cloud console. You can configure the app to run builds
on specific branches, tags, or paths. For example, if you want to run builds on pull requests that
target the master branch, you can use the following trigger configuration:
includedFiles:
- '**'
name: 'pull-request-trigger'
github:
name: 'my-repo'
owner: 'my-org'
pullRequest:
branch: '^master$'
Using Cloud Build to run tests in a specific folder and trigger builds for every GitHub pull request
is a good way to continuously test your feature branch and ensure that all code is tested before
changes are accepted. This way, you can catch any errors or bugs early and prevent them from
affecting the main branch.
Using a Jenkins server for CI/CD pipelines is not a bad option, but it would require more setup
and maintenance than using Cloud Build, which is fully managed by Google Cloud. Periodically
running all tests in the feature branch is not as efficient as running tests for every pull request,
as it may delay the feedback loop and increase the risk of conflicts or failures. Using Cloud Build
to run the tests after a pull request is merged is not a good practice, as it may introduce errors
or bugs into the main branch that could have been prevented by testing before merging.
Asking the pull request reviewers to run the integration tests before approving the code is not a
reliable way of ensuring code quality, as it depends on human intervention and may be prone to
errors or oversights.
Reference:
1: Overview | Cloud Build Documentation | Google Cloud
2: Running integration tests | Cloud Build Documentation | Google Cloud
3: Build configuration overview | Cloud Build Documentation | Google Cloud
4: Building repositories from GitHub | Cloud Build Documentation | Google Cloud
5: Creating GitHub app triggers | Cloud Build Documentation | Google Cloud
QUESTION: 133
You are deploying an application to Cloud Run. The application requires a password to start.
Your organization requires that all passwords are rotated every 24 hours, and your application
must have the latest password. You need to deploy the application with no downtime.
What should you do?
A. Store the password in Secret Manager and send the secret to the application by using
environment variables.
B. Store the password in Secret Manager and mount the secret as a volume within the
application.
C. Use Cloud Build to add your password into the application container at build time. Ensure
that Artifact Registry is secured from public access.
D. Store the password directly in the code. Use Cloud Build to rebuild and deploy the application
each time the password changes.
Answer(s): B
Explanation:
The correct answer is B. Store the password in Secret Manager and mount the secret as a
volume within the application.
Secret Manager is a service that allows you to securely store and manage sensitive data such
as passwords, API keys, certificates, and tokens. You can use Secret Manager to rotate your
secrets automatically or manually, and access them from your Cloud Run applications. There
are two ways to use secrets from Secret Manager in Cloud Run:
As environment variables: You can set environment variables that point to secrets in Secret
Manager. Cloud Run will resolve the secrets at runtime and inject them into the environment of
your application. However, this method has some limitations, such as:
The environment variables are cached for up to 10 minutes, so you may not get the latest
version of the secret immediately.
The environment variables are visible in plain text in the Cloud Console and the Cloud SDK,
which may expose sensitive information.
The environment variables are limited to 4 KB of data, which may not be enough for some
secrets.2 As file system volumes: You can mount secrets from Secret Manager as files in a
volume within your application. Cloud Run will create a tmpfs volume and write the secrets as
files in it. This method has some advantages, such as:
The files are updated every 30 seconds, so you can get the latest version of the secret faster.
The files are not visible in the Cloud Console or the Cloud SDK, which provides better security.
The files can store up to 64 KB of data, which allows for larger secrets.3 Therefore, for your use
case, it is better to use the second method and mount the secret as a file system volume within
your application. This way, you can ensure that your application has the latest password, and
you can deploy it with no downtime.
To mount a secret as a file system volume in Cloud Run, you can use the following command:
gcloud beta run deploy SERVICE --image IMAGE_URL --update-
secrets=/path/to/file=secretName:version where:
SERVICE is the name of your Cloud Run service.
IMAGE_URL is the URL of your container image.
/path/to/file is the path where you want to mount the secret file in your application. secretName
is the name of your secret in Secret Manager. version is the version of your secret. You can use
latest to get the most recent version.3
You can also use the Cloud Console to mount secrets as file system volumes. For more details,
see Mounting secrets from Secret Manager.
Reference:
1: Overview | Secret Manager Documentation | Google Cloud
2: Using secrets as environment variables | Cloud Run Documentation | Google Cloud
3: Mounting secrets from Secret Manager | Cloud Run Documentation | Google Cloud
QUESTION: 134
You are designing a system with three different environments: development, quality assurance
(QA), and production.
Each environment will be deployed with Terraform and has a Google Kubemetes Engine (GKE)
cluster created so that application teams can deploy their applications. Anthos Config
Management will be used and templated to deploy infrastructure level resources in each GKE
cluster. All users (for example, infrastructure operators and application owners) will use GitOps.
How should you structure your source control repositories for both Infrastructure as Code (laC)
and application code?
A. Cloud Infrastructure (Terraform) repository is shared: different directories are different

environments
GKE Infrastructure (Anthos Config Management Kustomize manifests) repository is shared:
different overlay directories are different environments
Application (app source code) repositories are separated: different branches are different
features
B. Cloud Infrastructure (Terraform) repository is shared: different directories are different
environments
GKE Infrastructure (Anthos Config Management Kustomize manifests) repositories are
separated:
different branches are different environments
Application (app source code) repositories are separated: different branches are different
features
C. Cloud Infrastructure (Terraform) repository is shared: different branches are different
environments
GKE Infrastructure (Anthos Config Management Kustomize manifests) repository is shared:
different overlay directories are different environments
Application (app source code) repository is shared: different directories are different features
D. Cloud Infrastructure (Terraform) repositories are separated: different branches are different
environments
GKE Infrastructure (Anthos Config Management Kustomize manifests) repositories are
separated:
different overlay directories are different environments Application (app source code)
repositories are separated: different branches are different features
Answer(s): B
Explanation:
The correct answer is B. Cloud Infrastructure (Terraform) repository is shared: different
directories are different environments. GKE Infrastructure (Anthos Config Management
Kustomize manifests)
repositories are separated: different branches are different environments. Application (app
source code) repositories are separated: different branches are different features. This answer
follows the best practices for using Terraform and Anthos Config Management with GitOps, as
described in the following sources:
For Terraform, it is recommended to use a single repository for all environments, and use
directories to separate them. This way, you can reuse the same Terraform modules and
configurations across environments, and avoid code duplication and drift. You can also use
Terraform workspaces to isolate the state files for each environment.
For Anthos Config Management, it is recommended to use separate repositories for each
environment, and use branches to separate the clusters within each environment. This way, you
can enforce different policies and configurations for each environment, and use pull requests to
promote changes across environments. You can also use Kustomize to create overlays for each
cluster that apply specific patches or customizations.
For application code, it is recommended to use separate repositories for each application, and
use branches to separate the features or bug fixes for each application. This way, you can
isolate the development and testing of each application, and use pull requests to merge
changes into the main branch. You can also use tags or labels to trigger deployments to
different environments5 .
Reference:
1: Best practices for using Terraform | Google Cloud
2: Terraform Recommended Practices - Part 1 | Terraform - HashiCorp Learn
3: Deploy Anthos on GKE with Terraform part 1: GitOps with Config Sync | Google Cloud Blog
4: Using Kustomize with Anthos Config Management | Anthos Config Management
Documentation | Google Cloud
5: Deploy Anthos on GKE with Terraform part 3: Continuous Delivery with Cloud Build | Google
Cloud Blog
: GitOps-style continuous delivery with Cloud Build | Cloud Build Documentation | Google Cloud
QUESTION: 135
Your Cloud Run application writes unstructured logs as text strings to Cloud Logging. You want
to convert the unstructured logs to JSON-based structured logs.
What should you do?
A. A Install a Fluent Bit sidecar container, and use a JSON parser.

B. Install the log agent in the Cloud Run container image, and use the log agent to forward logs
to Cloud Logging.
C. Configure the log agent to convert log text payload to JSON payload.
D. Modify the application to use Cloud Logging software development kit (SDK), and send log
entries with a jsonPay10ad field.
Answer(s): D
Explanation:
The correct answer is D. Modify the application to use Cloud Logging software development kit
(SDK), and send log entries with a jsonPayload field.
Cloud Logging SDKs are libraries that allow you to write structured logs from your Cloud Run
application. You can use the SDKs to create log entries with a jsonPayload field, which contains
a JSON object with the properties of your log entry. The jsonPayload field allows you to use
advanced features of Cloud Logging, such as filtering, querying, and exporting logs based on
the properties of your log entry.
To use Cloud Logging SDKs, you need to install the SDK for your programming language, and
then use the SDK methods to create and send log entries to Cloud Logging. For example, if you
are using Node.js, you can use the following code to write a structured log entry with a
jsonPayload field2:
// Imports the Google Cloud client library const {Logging} = require('@google-cloud/logging');
// Creates a client const logging = new Logging();
// Selects the log to write to const log = logging.log('my-log');
// The data to write to the log const text = 'Hello, world!';

const metadata = {
// Set the Cloud Run service name and revision as labels labels: {
service_name: process.env.K_SERVICE || 'unknown',
revision_name: process.env.K_REVISION || 'unknown',
},
// Set the log entry payload type and value jsonPayload: {
message: text,
timestamp: new Date(),
},
};
// Prepares a log entry const entry = log.entry(metadata);
// Writes the log entry await log.write(entry);

console.log(`Logged: ${text}`);
Using Cloud Logging SDKs is the best way to convert unstructured logs to structured logs, as it
provides more flexibility and control over the format and content of your log entries. Using a
Fluent Bit sidecar container is not a good option, as it adds complexity and overhead to your
Cloud Run application. Fluent Bit is a lightweight log processor and forwarder that can be used
to collect and parse logs from various sources and send them to different destinations.
However, Cloud Run does not support sidecar containers, so you would need to run Fluent Bit
as part of your main container image. This would require modifying your Dockerfile and
configuring Fluent Bit to read logs from supported locations and parse them as JSON. This is
more cumbersome and less reliable than using Cloud Logging SDKs.
Using the log agent in the Cloud Run container image is not possible, as the log agent is not
supported on Cloud Run. The log agent is a service that runs on Compute Engine or Google
Kubernetes Engine instances and collects logs from various applications and system
components. However, Cloud Run does not allow you to install or run any agents on its
underlying infrastructure, as it is a fully managed service that abstracts away the details of the
underlying platform. Storing the password directly in the code is not a good practice, as it
exposes sensitive information and makes it hard to change or rotate the password. It also
requires rebuilding and redeploying the application each time the password changes, which
adds unnecessary work and downtime.
Reference:
1: Writing structured logs | Cloud Run Documentation | Google Cloud
2: Write structured logs | Cloud Run Documentation | Google Cloud
3: Fluent Bit - Fast and Lightweight Log Processor & Forwarder : Logging Best Practices for
Serverless Applications - Google Codelabs
: About the logging agent | Cloud Logging Documentation | Google Cloud : Cloud Run FAQ |
Google Cloud
QUESTION: 136
You are the Operations Lead for an ongoing incident with one of your services. The service
usually runs at around 70% capacity. You notice that one node is returning 5xx errors for all
requests. There has also been a noticeable increase in support cases from customers. You
need to remove the offending node from the load balancer pool so that you can isolate and
investigate the node. You want to follow Google-recommended practices to manage the incident
and reduce the impact on users.
What should you do?
A. 1. Communicate your intent to the incident team.

2. Perform a load analysis to determine if the remaining nodes can handle the increase in traffic
offloaded from the removed node, and scale appropriately.
3.
When any new nodes report healthy, drain traffic from the unhealthy node, and remove the
unhealthy node from service.
B. 1. Communicate your intent to the incident team.
2. Add a new node to the pool, and wait for the new node to report as healthy.
3.
When traffic is being served on the new node, drain traffic from the unhealthy node, and remove
the old node from service.
C. 1 . Drain traffic from the unhealthy node and remove the node from service.
2. Monitor traffic to ensure that the error is resolved and that the other nodes in the pool are
handling the traffic appropriately.
3. Scale the pool as necessary to handle the new load.
4. Communicate your actions to the incident team.
D. 1 . Drain traffic from the unhealthy node and remove the old node from service.
2. Add a new node to the pool, wait for the new node to report as healthy, and then serve traffic
to the new node.
3. Monitor traffic to ensure that the pool is healthy and is handling traffic appropriately.
4. Communicate your actions to the incident team.
Answer(s): A
Explanation:
The correct answer is A. Communicate your intent to the incident team. Perform a load analysis
to determine if the remaining nodes can handle the increase in traffic offloaded from the
removed node, and scale appropriately.
When any new nodes report healthy, drain traffic from the unhealthy node, and remove the
unhealthy node from service.
This answer follows the Google-recommended practices for incident management, as described
in the Chapter 9 - Incident Response, Google SRE Book. According to this source, some of the
best practices are:
Maintain a clear line of command. Designate clearly defined roles. Keep a working record of
debugging and mitigation as you go. Declare incidents early and often. Communicate your
intent before taking any action that might affect the service or the incident response. This helps
to avoid confusion, duplication of work, or unintended consequences. Perform a load analysis
before removing a node from the load balancer pool, as this might affect the capacity and
performance of the service. Scale the pool as necessary to handle the expected load. Drain
traffic from the unhealthy node before removing it from service, as this helps to avoid dropping
requests or causing errors for users.
Answer A follows these best practices by communicating the intent to the incident team,
performing a load analysis and scaling the pool, and draining traffic from the unhealthy node
before removing it. Answer B does not follow the best practice of performing a load analysis
before adding or removing nodes, as this might cause overloading or underutilization of
resources. Answer C does not follow the best practice of communicating the intent before taking
any action, as this might cause confusion or conflict with other responders. Answer D does not
follow the best practice of draining traffic from the unhealthy node before removing it, as this
might cause errors for users.
Reference:
1: Chapter 9 - Incident Response, Google SRE Book
QUESTION: 137
You recently migrated an ecommerce application to Google Cloud. You now need to prepare
the application for the upcoming peak traffic season. You want to follow Google-recommended
practices.
What should you do first to prepare for the busy season?
A. Migrate the application to Cloud Run, and use autoscaling.

B. Load test the application to profile its performance for scaling.
C. Create a Terraform configuration for the application's underlying infrastructure to quickly
deploy to additional regions.
D. Pre-provision the additional compute power that was used last season, and expect growth.
Answer(s): B
Explanation:
The first thing you should do to prepare your ecommerce application for the upcoming peak
traffic season is to load test the application to profile its performance for scaling. Load testing is
a process of simulating high traffic or user demand on your application and measuring how it
responds. Load testing can help you identify any bottlenecks, errors, or performance issues that
might affect your application during the busy season. Load testing can also help you determine
the optimal scaling strategy for your application, such as horizontal scaling (adding more
instances) or vertical scaling (adding more resources to each instance)2.
There are different tools and methods for load testing your ecommerce application on Google
Cloud, depending on the type and complexity of your application. For example, you can use
Cloud Load Balancing to distribute traffic across multiple instances of your application, and use
Cloud Monitoring to measure the latency, throughput, and error rate of your application. You can
also use Cloud Functions or Cloud Run to create serverless load generators that can simulate
user requests and send them to your application. Alternatively, you can use third-party tools
such as Apache JMeter or Locust to create and run load tests on your application. By load
testing your ecommerce application before the peak traffic season, you can ensure that your
application is ready to handle the expected load and provide a good user experience. You can
also use the results of your load tests to plan and implement other steps to prepare your
application for the busy season, such as migrating to a more scalable platform, creating a
Terraform configuration for deploying to additional regions, or pre-provisioning additional
compute power.
Reference:
1: Load Testing 101: How To Test Website Performance | BlazeMeter
2: Scaling applications | Google Cloud
3: Load testing using Google Cloud | Solutions | Google Cloud
4: Serverless load testing using Cloud Functions | Solutions | Google Cloud
QUESTION: 138
Your company runs applications in Google Kubernetes Engine (GKE) that are deployed
following a GitOps methodology.
Application developers frequently create cloud resources to support their applications. You want
to give developers the ability to manage infrastructure as code, while ensuring that you follow
Google- recommended practices. You need to ensure that infrastructure as code reconciles
periodically to avoid configuration drift.

What should you do?
A. Install and configure Config Connector in Google Kubernetes Engine (GKE).

B. Configure Cloud Build with a Terraform builder to execute plan and apply commands.
C. Create a Pod resource with a Terraform docker image to execute terraform plan and
terraform apply commands.
D. Create a Job resource with a Terraform docker image to execute terraforrm plan and
terraform apply commands.
Answer(s): A
Explanation:
The best option to give developers the ability to manage infrastructure as code, while ensuring
that you follow Google-recommended practices, is to install and configure Config Connector in
Google Kubernetes Engine (GKE).
Config Connector is a Kubernetes add-on that allows you to manage Google Cloud resources
through Kubernetes. You can use Config Connector to create, update, and delete Google Cloud
resources using Kubernetes manifests. Config Connector also reconciles the state of the
Google Cloud resources with the desired state defined in the manifests, ensuring that there is
no configuration drift.
Config Connector follows the GitOps methodology, as it allows you to store your infrastructure
configuration in a Git repository, and use tools such as Anthos Config Management or Cloud
Source Repositories to sync the configuration to your GKE cluster. This way, you can use Git as
the source of truth for your infrastructure, and enable reviewable and version-controlled
workflows. Config Connector can be installed and configured in GKE using either the Google
Cloud Console or the gcloud command-line tool. You need to enable the Config Connector add-
on for your GKE cluster, and create a Google Cloud service account with the necessary
permissions to manage the Google Cloud resources. You also need to create a Kubernetes
namespace for each Google Cloud project that you want to manage with Config Connector.
By using Config Connector in GKE, you can give developers the ability to manage infrastructure
as code, while ensuring that you follow Google-recommended practices. You can also benefit
from the features and advantages of Kubernetes, such as declarative configuration,
observability, and portability.
Reference:
1: Overview | Artifact Registry Documentation | Google Cloud
2: Deploy Anthos on GKE with Terraform part 1: GitOps with Config Sync | Google Cloud Blog
3: Installing Config Connector | Config Connector Documentation | Google Cloud
4: Why use Config Connector? | Config Connector Documentation | Google Cloud
QUESTION: 139
Your company runs services by using Google Kubernetes Engine (GKE). The GKE clusters in
the development environment run applications with verbose logging enabled. Developers view
logs by using the kubect1 logs command and do not use Cloud Logging. Applications do not
have a uniform logging structure defined. You need to minimize the costs associated with
application logging while still collecting GKE operational logs.
What should you do?
A. Run the gcloud container clusters update --logging--SYSTEM command for the development
cluster.
B. Run the gcloud container clusters update logging=WORKLOAD command for the
development cluster.
C. Run the gcloud logging sinks update _Defau1t --disabled command in the project associated
with the development environment.
D. Add the severity >= DEBUG resource. type "k83 container" exclusion filter to the Default
logging sink in the project associated with the development environment.
Answer(s): A
QUESTION: 140
You are the Site Reliability Engineer responsible for managing your company's data services
and products. You regularly navigate operational challenges, such as unpredictable data
volume and high cost, with your company's data ingestion processes. You recently learned that
a new data ingestion product will be developed in Google Cloud. You need to collaborate with
the product development team to provide operational input on the new product.
What should you do?
A. Deploy the prototype product in a test environment, run a load test, and share the results with
the product development team.
B. When the initial product version passes the quality assurance phase and compliance
assessments, deploy the product to a staging environment. Share error logs and performance
metrics with the product development team.
C. When the new product is used by at least one internal customer in production, share error
logs and monitoring metrics with the product development team.
D. Review the design of the product with the product development team to provide feedback
early in the design phase.
Answer(s): D
Explanation:
The correct answer is D. Review the design of the product with the product development team
to provide feedback early in the design phase.
According to the Google Cloud DevOps best practices, a Site Reliability Engineer (SRE) should
collaborate with the product development team from the beginning of the product lifecycle, not
just after the product is deployed or tested. This way, the SRE can provide operational input on
the product design, such as scalability, reliability, security, and cost efficiency. The SRE can
also help define service level objectives (SLOs) and service level indicators (SLIs) for the
product, as well as monitoring and alerting strategies. By collaborating early and often, the SRE
and the product development team can ensure that the product meets the operational
requirements and expectations of the customers.
Reference:
Preparing for Google Cloud Certification: Cloud DevOps Engineer Professional Certificate,
Course 1:
Site Reliability Engineering and DevOps, Week 1: Introduction to SRE and DevOps.
QUESTION: 141
You are designing a deployment technique for your applications on Google Cloud. As part Of
your deployment planning, you want to use live traffic to gather performance metrics for new
versions Of your applications. You need to test against the full production load before your
applications are launched.

What should you do?
A. Use A/B testing with blue/green deployment.

B. Use shadow testing with continuous deployment.
C. Use canary testing with continuous deployment.
D. Use canary testing with rolling updates deployment,
Answer(s): B
Explanation:
The correct answer is B. Use shadow testing with continuous deployment.
Shadow testing is a deployment technique that involves routing a copy of the live traffic to a new
version of the application, without affecting the production environment. This way, you can
gather performance metrics and compare them with the current version, without exposing the
new version to the users. Shadow testing can help you test against the full production load and
identify any issues or bottlenecks before launching the new version. You can use continuous
deployment to automate the process of deploying the new version after it passes the shadow
testing.
Reference:
Application deployment and testing strategies, Testing strategies, Shadow test pattern.
QUESTION: 142
As a Site Reliability Engineer, you support an application written in GO that runs on Google
Kubernetes Engine (GKE) in production. After releasing a new version Of the application, you
notice the application runs for about 15 minutes and then restarts. You decide to add Cloud
Profiler to your application and now notice that the heap usage grows constantly until the
application restarts.
What should you do?
A. Add high memory compute nodes to the cluster.

B. Increase the memory limit in the application deployment.
C. Add Cloud Trace to the application, and redeploy.
D. Increase the CPU limit in the application deployment.
Answer(s): B
Explanation:
The correct answer is B. Increase the memory limit in the application deployment.
The application is experiencing a memory leak, which means that it is allocating memory that is
not freed or reused. This causes the heap usage to grow constantly until it reaches the memory
limit of the pod, which triggers a restart by Kubernetes. Increasing the memory limit in the
application deployment can help mitigate the problem by allowing the application to run longer
before reaching the limit. However, this is not a permanent solution, as the memory leak will still
occur and eventually exhaust the available memory. The best solution is to identify and fix the
source of the memory leak in the application code, using tools like Cloud Profiler and pprof.
Reference:
Using Cloud Profiler with Go, Troubleshooting memory leaks. Profiling Go Programs, Heap
profiles.
QUESTION: 143
You need to create a Cloud Monitoring SLO for a service that will be published soon. You want
to verify that requests to the service will be addressed in fewer than 300 ms at least 90% Of the
time per calendar month. You need to identify the metric and evaluation method to use.
What should you do?
A. Select a latency metric for a request-based method of evaluation.

B. Select a latency metric for a window-based method of evaluation.
C. Select an availability metric for a request-based method of evaluation.
D. Select an availability metric for a window-based method Of evaluation.
Answer(s): A
Explanation:
A) Select a latency metric for a request-based method of evaluation.
A latency metric measures how responsive your service is to users. For example, you can use
the cloud.googleapis.com/http/server/response_latencies metric to measure the latency of
HTTP requests to your service. A request-based method of evaluation counts the number of
successful requests that meet a certain criterion, such as being below a latency threshold, and
compares it to the number of all requests. For example, you can define an SLI as the ratio of
requests with latency below 300 ms to all requests. A request-based method of evaluation is
suitable for measuring performance over time, such as per calendar month. You can set an SLO
for the SLI to be at least 90%, which means that you expect 90% of the requests to have latency
below 300 ms in a month.
Reference:
Creating an SLO | Operations Suite | Google Cloud, Choosing a metric, Latency metric.
Concepts in service monitoring | Operations Suite | Google Cloud, Service-level indicators,
Request-based SLIs. Learn how to set SLOs - SRE tips | Google Cloud Blog, Setting SLOs.
QUESTION: 144
Your company runs applications in Google Kubernetes Engine (GKE). Several applications rely
on ephemeral volumes. You noticed some applications were unstable due to the DiskPressure
node condition on the worker nodes. You need to identify which Pods are causing the issue, but
you do not have execute access to workloads and nodes.
What should you do?
A. Check the node/ephemeral_storage/used_bytes metric by using Metrics Explorer.

B. Check the metric by using Metrics Explorer.
C. Locate all the Pods with emptyDir volumes. use the df-h command to measure volume disk
usage.
D. Locate all the Pods with emptyDir volumes. Use the du -sh * command to measure volume
disk usage.
Answer(s): A
Explanation:
The correct answer is A. Check the node/ephemeral_storage/used_bytes metric by using
Metrics Explorer.
The node/ephemeral_storage/used_bytes metric reports the total amount of ephemeral storage

used by Pods on each node. You can use Metrics Explorer to query and visualize this metric
and filter it by node name, namespace, or Pod name. This way, you can identify which Pods are
consuming the most ephemeral storage and causing disk pressure on the nodes. You do not
need to have execute access to the workloads or nodes to use Metrics Explorer. The other
options are incorrect because they require execute access to the workloads or nodes, which
you do not have. The df -h and du -sh * commands are Linux commands that can measure disk
usage, but you need to run them inside the Pods or on the nodes, which is not possible in your
scenario.
Reference:
Monitoring metrics for Kubernetes system components, Node metrics,
node/ephemeral_storage/used_bytes. Using Metrics Explorer, Querying metrics. How do I find
out disk space utilization information using Linux command line?, df command. How to check
disk space in Linux from the command line, du command.
QUESTION: 145
You are configuring your CI/CD pipeline natively on Google Cloud. You want builds in a pre-
production Google Kubernetes Engine (GKE) environment to be automatically load-tested
before being promoted to the production GKE environment. You need to ensure that only builds
that have passed this test are deployed to production. You want to follow Google-recommended
practices. How should you configure this pipeline with Binary Authorization?
A. Create an attestation for the builds that pass the load test by requiring the lead quality
assurance engineer to sign the attestation by using a key stored in Cloud Key Management
Service (Cloud KMS).
B. Create an attestation for the builds that pass the load test by using a private key stored in
Cloud Key Management Service (Cloud KMS) authenticated through Workload Identity.
C. Create an attestation for the builds that pass the load test by using a private key stored in
Cloud Key Management Service (Cloud KMS) with a service account JSON key stored as a
Kubernetes Secret.
D. Create an attestation for the builds that pass the load test by requiring the lead quality
assurance engineer to sign the attestation by using their personal private key.
Answer(s): B
Explanation:
The correct answer is B. Create an attestation for the builds that pass the load test by using a
private key stored in Cloud Key Management Service (Cloud KMS) authenticated through
Workload Identity.
According to the Google Cloud documentation, Binary Authorization is a deploy-time security

control that ensures only trusted container images are deployed on Google Kubernetes Engine
(GKE) or Cloud Run. Binary Authorization uses attestations to certify that a specific image has
completed a previous stage in the CI/CD pipeline, such as passing a load test. Attestations are
signed by private keys that are associated with attestors, which are entities that verify the
attestations. To follow Google-recommended practices, you should store your private keys in
Cloud Key Management Service (Cloud KMS), which is a secure and scalable service for
managing cryptographic keys. You should also use Workload Identity, which is a feature that
allows Kubernetes service accounts to act as Google service accounts, to authenticate to Cloud
KMS and sign attestations without having to manage or expose service account keys.
The other options are incorrect because they do not follow Google-recommended practices.
Option A and option D require human intervention to sign the attestations, which is not scalable
or automated. Option C exposes the service account JSON key as a Kubernetes Secret, which
is less secure than using Workload Identity.
Reference:
Creating an attestor, Creating an attestor. Cloud Key Management Service Documentation,
Overview. Attestations overview, Attestations overview. Using Workload Identity with Binary
Authorization, Using Workload Identity with Binary Authorization. Binary Authorization, Binary
Authorization.
QUESTION: 146
You recently noticed that one Of your services has exceeded the error budget for the current
rolling window period. Your company's product team is about to launch a new feature. You want
to follow Site Reliability Engineering (SRE) practices.
What should you do?
A. Notify the team that their error budget is used up. Negotiate with the team for a launch freeze
or tolerate a slightly worse user experience.
B. Look through other metrics related to the product and find SLOs with remaining error budget.
Reallocate the error budgets and allow the feature launch.
C. Escalate the situation and request additional error budget.
D. Notify the team about the lack of error budget and ensure that all their tests are successful so
the launch will not further risk the error budget.
Answer(s): A
Explanation:
A) Notify the team that their error budget is used up. Negotiate with the team for a launch freeze
or tolerate a slightly worse user experience.
According to the Site Reliability Engineering (SRE) practices, an error budget is the amount of
unreliability that a service can tolerate without harming user satisfaction. An error budget is
derived from the service-level objectives (SLOs), which are the measurable goals for the service
quality.
When a service exceeds its error budget, it means that it has violated its SLOs and may have
negatively impacted the users. In this case, the SRE team should notify the product team that
their error budget is used up and negotiate with them for a launch freeze or a lower SLO3. A
launch freeze means that no new features are deployed until the service reliability is restored. A
lower SLO means that the product team accepts a slightly worse user experience in exchange
for launching new features. Both options require a trade-off between reliability and innovation,
and should be agreed upon by both teams.
The other options are incorrect because they do not follow the SRE practices. Option B is
incorrect because it violates the principle of error budget autonomy, which means that each
service should have its own error budget and SLOs, and should not borrow or reallocate them
from other services. Option C is incorrect because it does not address the root cause of the
error budget overspend, and may create unrealistic expectations for the service reliability.
Option D is incorrect because it does not prevent the possibility of introducing new errors or
bugs with the feature launch, which may further degrade the service quality and user
satisfaction.
Reference:
Error Budgets, Error Budgets. Service Level Objectives, Service Level Objectives. Error Budget
Policies, Error Budget Policies. Error Budget Autonomy, Error Budget Autonomy.
QUESTION: 147
You are monitoring a service that uses n2-standard-2 Compute Engine instances that serve
large files. Users have reported that downloads are slow. Your Cloud Monitoring dashboard
shows that your
VMS are running at peak network throughput. You want to improve the network throughput
performance.
What should you do?
A. Deploy a Cloud NAT gateway and attach the gateway to the subnet of the VMS.
B. Add additional network interface controllers (NICs) to your VMS.
C. Change the machine type for your VMS to n2-standard-8.
D. Deploy the Ops Agent to export additional monitoring metrics.
Answer(s): C
Explanation:
The correct answer is C) Change the machine type for your VMs to n2-standard-8.
According to the Google Cloud documentation, the network throughput performance of a

Compute Engine VM depends on its machine type. The n2-standard-2 machine type has a
maximum egress bandwidth of 4 Gbps, which can be a bottleneck for serving large files. By
changing the machine type to n2-standard-8, you can increase the maximum egress bandwidth
to 16 Gbps, which can improve the network throughput performance and reduce the download
time for users. You also need to enable per VM Tier_1 networking performance, which is a
feature that allows VMs to achieve higher network performance than the default settings.
The other options are incorrect because they do not improve the network throughput
performance of your VMs. Option A is incorrect because Cloud NAT is a service that allows
private IP addresses to access the internet, but it does not increase the network bandwidth or
speed. Option B is incorrect because adding additional network interfaces (NICs) or IP
addresses per NIC does not increase ingress or egress bandwidth for a VM1. Option D is
incorrect because deploying the Ops Agent can help you monitor and troubleshoot your VMs,
but it does not affect the network throughput performance.
Reference:
Cloud NAT overview, Cloud NAT overview. Network bandwidth, Bandwidth summary. Installing
the Ops Agent, Installing the Ops Agent. Configure per VM Tier_1 networking performance,
Configure per VM Tier_1 networking performance.
QUESTION: 148
You are designing a new Google Cloud organization for a client. Your client is concerned with
the risks associated with long-lived credentials created in Google Cloud. You need to design a
solution to completely eliminate the risks associated with the use of JSON service account keys
while minimizing operational overhead.
What should you do?
A. Use custom versions of predefined roles to exclude all iam.serviceAccountKeys. * service

account role permissions.
B. Apply the constraints/iam.disableserviceAccountKeycreation constraint to the organization.
C. Apply the constraints/iam.disableServiceAccountKeyUp10ad constraint to the organization.
D. Grant the roles/ iam.serviceAccountKeyAdmin IAM role to organization administrators only.
Answer(s): B
Explanation:
The correct answer is B) Apply the constraints/iam.disableServiceAccountKeyCreation
constraint to the organization.
According to the Google Cloud documentation, the

constraints/iam.disableServiceAccountKeyCreation constraint is an organization policy
constraint that prevents the creation of user-managed service account keys. User-managed
service account keys are long-lived credentials that can be downloaded as JSON or P12 files
and used to authenticate as a service account. These keys pose severe security risks if they are
leaked, stolen, or misused by unauthorized entities. By applying this constraint to the
organization, you can completely eliminate the risks associated with the use of JSON service
account keys and enforce a more secure alternative for authentication, such as Workload
Identity or short-lived access tokens. This also minimizes operational overhead by avoiding the
need to manage, rotate, or revoke user-managed service account keys.
The other options are incorrect because they do not completely eliminate the risks associated
with the use of JSON service account keys. Option A is incorrect because it only restricts the
IAM permissions to create, list, get, delete, or sign service account keys, but it does not prevent
existing keys from being used or leaked. Option C is incorrect because it only disables the
upload of user- managed service account keys, but it does not prevent the creation or download
of such keys. Option D is incorrect because it only limits the IAM role that can create and
manage service account keys, but it does not prevent the keys from being distributed or
exposed to unauthorized entities.
Reference:
Disable user-managed service account key creation, Disable user-managed service account
key creation. Service accounts, User-managed service accounts. Help keep your Google Cloud
service account keys safe, Help keep your Google Cloud service account keys safe. Stop
Downloading Google Cloud Service Account Keys!, Stop Downloading Google Cloud Service
Account Keys! [Service Account Keys], Service Account Keys. [Disable user-managed service
account key upload], Disable user-managed service account key upload. [Granting roles to
service accounts], Granting roles to service accounts.
QUESTION: 149
You have deployed a fleet Of Compute Engine instances in Google Cloud. You need to ensure
that monitoring metrics and logs for the instances are visible in Cloud Logging and Cloud
Monitoring by your company's operations and cyber security teams. You need to grant the
required roles for the Compute Engine service account by using Identity and Access
Management (IAM) while following the principle of least privilege.
What should you do?
A. Grant the logging.editor and monitoring.metricwriter roles to the Compute Engine service
accounts.
B. Grant the Logging. admin and monitoring . editor roles to the Compute Engine service
accounts.
C. Grant the logging. logwriter and monitoring. editor roles to the Compute Engine service
accounts.
D. Grant the logging. logWriter and monitoring. metricWriter roles to the Compute Engine
service accounts.
Answer(s): D
Explanation:
The correct answer is D) Grant the logging.logWriter and monitoring.metricWriter roles to the
Compute Engine service accounts.
According to the Google Cloud documentation, the Compute Engine service account is a
Google- managed service account that is automatically created when you enable the Compute
Engine API1. This service account is used by default to run your Compute Engine instances and
access other Google Cloud services on your behalf. To ensure that monitoring metrics and logs
for the instances are visible in Cloud Logging and Cloud Monitoring, you need to grant the
following IAM roles to the Compute Engine service account23:
The logging.logWriter role allows the service account to write log entries to Cloud Logging. The
monitoring.metricWriter role allows the service account to write custom metrics to Cloud
Monitoring.
These roles grant the minimum permissions that are needed for logging and monitoring,
following the principle of least privilege. The other roles are either unnecessary or too broad for
this purpose. For example, the logging.editor role grants permissions to create and update logs,
log sinks, and log exclusions, which are not required for writing log entries. The logging.admin
role grants permissions to delete logs, log sinks, and log exclusions, which are not required for
writing log entries and may pose a security risk if misused. The monitoring.editor role grants
permissions to create and update alerting policies, uptime checks, notification channels,
dashboards, and groups, which are not required for writing custom metrics.
Reference:
Service accounts, Service accounts. Setting up Stackdriver Logging for Compute Engine,
Setting up Stackdriver Logging for Compute Engine. Setting up Stackdriver Monitoring for
Compute Engine, Setting up Stackdriver Monitoring for Compute Engine. Predefined roles,
Predefined roles. Predefined roles, Predefined roles. Predefined roles, Predefined roles.
[Predefined roles], Predefined roles. [Predefined roles], Predefined roles.
QUESTION: 150
You are investigating issues in your production application that runs on Google Kubernetes
Engine (GKE). You determined that the source Of the issue is a recently updated container
image, although the exact change in code was not identified. The deployment is currently
pointing to the latest tag. You need to update your cluster to run a version of the container that
functions as intended.
What should you do?
A. Create a new tag called stable that points to the previously working container, and change
the deployment to point to the new tag.

B. Apply the latest tag to the previous container image, and do a rolling update on the
deployment.
C. Build a new container from a previous Git tag, and do a rolling update on the deployment to
the new container.
D. Alter the deployment to point to the sha2 56 digest of the previously working container.
Answer(s): D
QUESTION: 151
You are leading a DevOps project for your organization. The DevOps team is responsible for
managing the service infrastructure and being on-call for incidents. The Software Development
team is responsible for writing, submitting, and reviewing code. Neither team has any published
SLOs. You want to design a new joint-ownership model for a service between the DevOps team
and the Software Development team.
Which responsibilities should be assigned to each team in the new joint-ownership model?
A. Option A
B. Option B
C. Option C
D. Option D
Answer(s): D
Explanation:
The correct answer is D) Option D)
According to the DevOps best practices, a joint-ownership model for a service between the
DevOps team and the Software Development team should follow these principles12:
The DevOps team and the Software Development team should share the responsibility and
collaboration for managing the service infrastructure, performing code reviews, and adopting
and sharing SLOs for the service.
The DevOps team and the Software Development team should have end-to-end ownership of
the service, from design to development to deployment to operation to maintenance. The
DevOps team and the Software Development team should use common tools and processes to
facilitate communication, coordination, and feedback.
The DevOps team and the Software Development team should align their goals and incentives
with the business outcomes and customer satisfaction.
Option D is the only option that reflects these principles. Option D assigns both teams the
responsibilities of managing the service infrastructure, performing code reviews, and adopting
and sharing SLOs for the service. Option D also implies that both teams have end-to-end
ownership of the service, as they are involved in every stage of the service lifecycle. Option D
also encourages both teams to use common tools and processes, such as GitLab3, to
collaborate and communicate effectively. Option D also aligns both teams with the business
outcomes and customer satisfaction, as they use SLOs to measure and improve the service
quality. The other options are incorrect because they do not follow the DevOps best practices.
Option A is incorrect because it assigns only the DevOps team the responsibility of managing
the service infrastructure, which creates a silo between the two teams and reduces their
collaboration. Option A also does not assign any responsibility for adopting and sharing SLOs
for the service, which means that both teams lack a common metric for measuring and
improving the service quality. Option B is incorrect because it assigns only the Software
Development team the responsibility of performing code reviews, which creates a gap between
the two teams and reduces their feedback. Option B also does not assign any responsibility for
adopting and sharing SLOs for the service, which means that both teams lack a common metric
for measuring and improving the service quality. Option C is incorrect because it assigns both
teams the same responsibilities as option A and option B, which combines their drawbacks.
Reference:
5 key organizational models for DevOps teams | GitLab, 5 key organizational models for
DevOps teams | GitLab. Building a Culture of Full-Service Ownership - DevOps.com, Building a
Culture of Full- Service Ownership - DevOps.com. GitLab, GitLab.
QUESTION: 152
You are deploying a Cloud Build job that deploys Terraform code when a Git branch is updated.
While testing, you noticed that the job fails. You see the following error in the build logs:
Initializing the backend. ..
Error: Failed to get existing workspaces : querying Cloud Storage failed: googleapi : Error
You need to resolve the issue by following Google-recommended practices.
What should you do?
A. Change the Terraform code to use local state.

B. Create a storage bucket with the name specified in the Terraform configuration.
C. Grant the roles/ owner Identity and Access Management (IAM) role to the Cloud Build service
account on the project.
D. Grant the roles/ storage. objectAdmin Identity and Access Management (IAM) role to the
Cloud Build service account on the state file bucket.
Answer(s): D
Explanation:
The correct answer is D) Grant the roles/storage.objectAdmin Identity and Access Management
(IAM) role to the Cloud Build service account on the state file bucket.
According to the Google Cloud documentation, Cloud Build is a service that executes your
builds on Google Cloud Platform infrastructure. Cloud Build uses a service account to execute
your build steps and access resources, such as Cloud Storage buckets. Terraform is an open-
source tool that allows you to define and provision infrastructure as code. Terraform uses a
state file to store and track the state of your infrastructure. You can configure Terraform to use a
Cloud Storage bucket as a backend to store and share the state file across multiple users or
environments. The error message indicates that Cloud Build failed to access the Cloud Storage
bucket that contains the Terraform state file. This is likely because the Cloud Build service
account does not have the necessary permissions to read and write objects in the bucket. To
resolve this issue, you need to grant the roles/storage.objectAdmin IAM role to the Cloud Build
service account on the state file bucket. This role allows the service account to create, delete,
and manage objects in the bucket. You can use the gcloud command-line tool or the Google
Cloud Console to grant this role. The other options are incorrect because they do not follow
Google-recommended practices. Option A is incorrect because it changes the Terraform code
to use local state, which is not recommended for production or collaborative environments, as it
can cause conflicts, data loss, or inconsistency.
Option B is incorrect because it creates a new storage bucket with the name specified in the
Terraform configuration, but it does not grant any permissions to the Cloud Build service
account on the new bucket. Option C is incorrect because it grants the roles/owner IAM role to
the Cloud Build service account on the project, which is too broad and violates the principle of
least privilege. The roles/owner role grants full access to all resources in the project, which can
pose a security risk if misused or compromised.
Reference:
Cloud Build Documentation, Overview. Service accounts, Service accounts. Terraform by
HashiCorp, Terraform by HashiCorp. State, State. Google Cloud Storage Backend, Google
Cloud Storage Backend. Predefined roles, Predefined roles. [Granting roles to service accounts
for specific resources], Granting roles to service accounts for specific resources. [Local
Backend], Local Backend.
[Understanding roles], Understanding roles.
QUESTION: 153
Your company processes IOT data at scale by using Pub/Sub, App Engine standard
environment, and an application written in GO. You noticed that the performance inconsistently
degrades at peak load. You could not reproduce this issue on your workstation. You need to
continuously monitor the application in production to identify slow paths in the code. You want to
minimize performance impact and management overhead.
What should you do?
A. Install a continuous profiling tool into Compute Engine. Configure the application to send
profiling data to the tool.
B. Periodically run the go tool pprof command against the application instance. Analyze the
results by using flame graphs.
C. Configure Cloud Profiler, and initialize the cloud.go@gle.com/go/profiler library in the
application.
D. Use Cloud Monitoring to assess the App Engine CPU utilization metric.
Answer(s): C
Explanation:
The correct answer is C) Configure Cloud Profiler, and initialize the cloud.google.com/go/profiler
library in the application.
According to the Google Cloud documentation, Cloud Profiler is a statistical, low-overhead

profiler that continuously gathers CPU usage and memory-allocation information from your
production applications. Cloud Profiler can help you identify slow paths in your code and
optimize the performance of your applications. Cloud Profiler supports applications written in Go
that run on App Engine standard environment. To use Cloud Profiler, you need to configure it in
your Google Cloud project and initialize the cloud.google.com/go/profiler library in your
application code. You can then use the Cloud Profiler interface to analyze the profiling data and
visualize the results by using flame graphs. Cloud Profiler has minimal performance impact and
management overhead, as it only samples a small fraction of the application activity and does
not require any additional infrastructure or agents.
The other options are incorrect because they do not meet the requirements of minimizing
performance impact and management overhead. Option A is incorrect because it requires
installing a continuous profiling tool into Compute Engine, which is an additional infrastructure
that needs to be managed and maintained. Option B is incorrect because it requires periodically
running the go tool pprof command against the application instance, which is a manual and
disruptive process that can affect the application performance. Option D is incorrect because it
only uses Cloud Monitoring to assess the App Engine CPU utilization metric, which is not
enough to identify slow paths in the code or optimize the application performance.
Reference:
Cloud Profiler documentation, Overview. Profiling Go applications, Supported environments.
Profiling Go applications, Using Cloud Profiler. Analyzing data, Analyzing data.
QUESTION: 154
You need to define SLOs for a high-traffic web application. Customers are currently happy with
the application performance and availability. Based on current measurement, the 90th percentile
Of latency is 160 ms and the 95th percentile of latency is 300 ms over a 28-day window.
What latency SLO should you publish?
A. 90th percentile - 150 ms

95th percentile - 290 ms
B. 90th percentile - 160 ms
C. 90th percentile - 190 ms

D. 90th percentile - 300 ms
Answer(s): B
Explanation:
a latency SLO is a service level objective that specifies a target level of responsiveness for a
web application. A latency SLO can be expressed as a percentile of latency over a time window,
such as the 90th percentile of latency over 28 days. A percentile of latency is the maximum
amount of time that a given percentage of requests take to complete. For example, the 90th
percentile of latency is the maximum amount of time that 90% of requests take to complete. To
define a latency SLO, you need to consider the following factors24:
The expectations and satisfaction of your customers. You want to set a latency SLO that reflects
the level of performance that your customers are happy with and willing to pay for. The current
and historical measurements of your latency. You want to set a latency SLO that is based on
data and realistic for your web application.
The trade-offs and costs of improving your latency. You want to set a latency SLO that balances
the benefits of faster response times with the costs of engineering work, infrastructure, and
complexity. Based on these factors, the best option for defining a latency SLO for your web
application is option B. Option B sets the latency SLO to match the current measurement of
your latency, which means that you are meeting the expectations and satisfaction of your
customers. Option B also sets a realistic and achievable target for your web application, which
means that you do not need to invest extra resources or effort to improve your latency. Option B
also aligns with the best practice of setting conservative SLOs, which means that you have
some buffer or margin for error in case your latency fluctuates or degrades.
QUESTION: 155
You need to enforce several constraint templates across your Google Kubernetes Engine (GKE)
clusters. The constraints include policy parameters, such as restricting the Kubernetes API. You
must ensure that the policy parameters are stored in a GitHub repository and automatically
applied when changes occur.
What should you do?
A. Set up a GitHub action to trigger Cloud Build when there is a parameter change. In Cloud
Build, run a gcloud CLI command to apply the change.
B. When there is a change in GitHub, use a web hook to send a request to Anthos Service
Mesh, and apply the change.
C. Configure Anthos Config Management with the GitHub repository.
When there is a change in the repository, use Anthos Config Management to apply the change.
D. Configure Config Connector with the GitHub repository.
When there is a change in the repository, use Config Connector to apply the change.
Answer(s): C
Explanation:
The correct answer is C) Configure Anthos Config Management with the GitHub repository.
When there is a change in the repository, use Anthos Config Management to apply the change.
According to the web search results, Anthos Config Management is a service that lets you
manage the configuration of your Google Kubernetes Engine (GKE) clusters from a single
source of truth, such as a GitHub repository. Anthos Config Management can enforce several
constraint templates across your GKE clusters by using Policy Controller, which is a feature that
integrates the Open Policy Agent (OPA) Constraint Framework into Anthos Config
Management. Policy Controller can apply constraints that include policy parameters, such as
restricting the Kubernetes API3. To use Anthos Config Management and Policy Controller, you
need to configure them with your GitHub repository and enable the sync mode.
When there is a change in the repository, Anthos Config Management will automatically sync
and apply the change to your GKE clusters. The other options are incorrect because they do not
use Anthos Config Management and Policy Controller. Option A is incorrect because it uses a
GitHub action to trigger Cloud Build, which is a service that executes your builds on Google
Cloud Platform infrastructure. Cloud Build can run a gcloud CLI command to apply the change,
but it does not use Anthos Config Management or Policy Controller. Option B is incorrect
because it uses a web hook to send a request to Anthos Service Mesh, which is a service that
provides a uniform way to connect, secure, monitor, and manage microservices on GKE
clusters. Anthos Service Mesh can apply the change, but it does not use Anthos Config
Management or Policy Controller. Option D is incorrect because it uses Config Connector,
which is a service that lets you manage Google Cloud resources through Kubernetes
configuration. Config Connector can apply the change, but it does not use Anthos Config
Management or Policy Controller.
Reference:
Anthos Config Management documentation, Overview. Policy Controller, Policy Controller.
Constraint template library, Constraint template library. Installing Anthos Config Management,
Installing Anthos Config Management. Syncing configurations, Syncing configurations. Cloud

Build documentation, Overview. Anthos Service Mesh documentation, Overview. [Config
Connector documentation], Overview.
QUESTION: 156
Your company recently migrated to Google Cloud. You need to design a fast, reliable, and
repeatable solution for your company to provision new projects and basic resources in Google
Cloud.
What should you do?
A. Use the Google Cloud console to create projects.

B. Write a script by using the gcloud CLI that passes the appropriate parameters from the
request.
Save the script in a Git repository.
C. Write a Terraform module and save it in your source control repository. Copy and run the
apply command to create the new project.
D. Use the Terraform repositories from the Cloud Foundation Toolkit. Apply the code with
appropriate parameters to create the Google Cloud project and related resources.
Answer(s): D
Explanation:
Terraform is an open-source tool that allows you to define and provision infrastructure as code.
Terraform can be used to create and manage Google Cloud resources, such as projects,
networks, and services. The Cloud Foundation Toolkit is a set of open-source Terraform
modules and tools that provide best practices and guidance for deploying Google Cloud
infrastructure. The Cloud Foundation Toolkit includes Terraform repositories for creating Google
Cloud projects and related resources, such as IAM policies, APIs, service accounts, and billing.
By using the Terraform repositories from the Cloud Foundation Toolkit, you can design a fast,
reliable, and repeatable solution for your company to provision new projects and basic
resources in Google Cloud. You can also customize the Terraform code to suit your specific
needs and preferences.
QUESTION: 157
You are configuring a Cl pipeline. The build step for your Cl pipeline integration testing requires
access to APIs inside your private VPC network. Your security team requires that you do not
expose API traffic publicly. You need to implement a solution that minimizes management
overhead.
What should you do?
A. Use Cloud Build private pools to connect to the private VPC.

B. Use Spinnaker for Google Cloud to connect to the private VPC.
C. Use Cloud Build as a pipeline runner. Configure Internal HTTP(S) Load Balancing for API
access.
D. Use Cloud Build as a pipeline runner. Configure External HTTP(S) Load Balancing with a
Google Cloud Armor policy for API access.
Answer(s): A
Explanation:
Cloud Build is a service that executes your builds on Google Cloud Platform infrastructure.
Cloud Build can be used as a pipeline runner for your CI pipeline, which is a process that
automates the integration and testing of your code. Cloud Build private pools are private,
dedicated pools of workers that offer greater customization over the build environment, including
the ability to access resources in a private VPC network. A VPC network is a virtual network that
provides connectivity for your Google Cloud resources and services. By using Cloud Build
private pools, you can implement a solution that minimizes management overhead, as Cloud
Build private pools are hosted and fully- managed by Cloud Build and scale up and down to
zero, with no infrastructure to set up, upgrade, or scale. You can also implement a solution that
meets your security requirement, as Cloud Build private pools use network peering to connect
into your private VPC network and do not expose API traffic publicly.
QUESTION: 158
Your organization stores all application logs from multiple Google Cloud projects in a central
Cloud Logging project. Your security team wants to enforce a rule that each project team can
only view their respective logs, and only the operations team can view all the logs. You need to
design a solution that meets the security team's requirements, while minimizing costs.
What should you do?
A. Export logs to BigQuery tables for each project team. Grant project teams access to their
tables.
Grant logs writer access to the operations team in the central logging project.
B. Create log views for each project team, and only show each project team their application
logs.
Grant the operations team access to the _ Al Il-jogs View in the central logging project.
C. Grant each project team access to the project _ Default view in the central logging project.
Grant logging viewer access to the operations team in the central logging project.
D. Create Identity and Access Management (IAM) roles for each project team and restrict
access to the _ Default log view in their individual Google Cloud project. Grant viewer access to
the operations team in the central logging project.
Answer(s): B
Explanation:
Create log views for each project team, and only show each project team their application logs.
Grant the operations team access to the _AllLogs View in the central logging project. This
approach aligns with the Google Cloud's recommended methodologies for Professional Cloud
DevOps Engineers. Log views allow you to create and manage access control at a finer
granularity for your logs. By creating a separate log view for each project team, you can ensure
that they only have access to their respective logs. The operations team, on the other hand, can
be granted access to the _AllLogs view in the central logging project, allowing them to view all
logs as required. This solution not only meets the security team's requirements but also
minimizes costs as it leverages built-in features of Google Cloud's logging and does not require
exporting logs to another service like BigQuery (as suggested in option A), which could incur
additional costs.
QUESTION: 159
Your CTO has asked you to implement a postmortem policy on every incident for internal use.
You want to define what a good postmortem is to ensure that the policy is successful at your
company.
What should you do?
Choose 2 answers
A. Ensure that all postmortems include what caused the incident, identify the person or team
responsible for causing the incident. and how to prevent a future occurrence of the incident.
B. Ensure that all postmortems include what caused the incident, how the incident could have
been worse, and how to prevent a future occurrence of the incident.
C. Ensure that all postmortems include the severity of the incident, how to prevent a future
occurrence of the incident. and what caused the incident without naming internal system
components.
D. Ensure that all postmortems include how the incident was resolved and what caused the
incident without naming customer information.
E. Ensure that all postmortems include all incident participants in postmortem authoring and
share postmortems as widely as possible,
Answer(s): B, E
Explanation:
The correct answers are B and E.
A good postmortem should include what caused the incident, how the incident could have been
worse, and how to prevent a future occurrence of the incident. This helps to identify the root
cause of the problem, the impact of the incident, and the actions to take to mitigate or eliminate
the risk of recurrence.
A good postmortem should also include all incident participants in postmortem authoring and
share postmortems as widely as possible. This helps to foster a culture of learning and
collaboration, as well as to increase the visibility and accountability of the incident response
process. Answer A is incorrect because it assigns blame to a person or team, which goes
against the principle of blameless postmortems. Blameless postmortems focus on finding
solutions rather than pointing fingers, and encourage honest and constructive feedback without
fear of punishment. Answer C is incorrect because it omits how the incident could have been
worse, which is an important factor to consider when evaluating the severity and impact of the
incident. It also avoids naming internal system components, which makes it harder to
understand the technical details and root cause of the problem.
Answer D is incorrect because it omits how to prevent a future occurrence of the incident, which
is the main goal of a postmortem. It also avoids naming customer information, which may be
relevant for understanding the impact and scope of the incident.
QUESTION: 160
Your uses Jenkins running on Google Cloud VM instances for CI/CD. You need to extend the
functionality to use infrastructure as code automation by using Terraform. You must ensure that
the Terraform Jenkins instance is authorized to create Google Cloud resources. You want to
follow Google-recommended practices- What should you do?
A. Add the auth application-default command as a step in Jenkins before running the Terraform
commands.
B. Create a dedicated service account for the Terraform instance. Download and copy the
secret key value to the GOOGLE environment variable on the Jenkins server.
C. Confirm that the Jenkins VM instance has an attached service account with the appropriate
Identity and Access Management (IAM) permissions.
use the Terraform module so that Secret Manager can retrieve credentials.
Answer(s): C
Explanation:
The correct answer is C.
Confirming that the Jenkins VM instance has an attached service account with the appropriate
Identity and Access Management (IAM) permissions is the best way to ensure that the
Terraform Jenkins instance is authorized to create Google Cloud resources. This follows the
Google- recommended practice of using service accounts to authenticate and authorize
applications running on Google Cloud
1. Service accounts are associated with private keys that can be used to generate access
tokens for Google Cloud APIs.
2. By attaching a service account to the Jenkins VM instance, Terraform can use the Application
Default Credentials (ADC) strategy to automatically find and use the service account
credentials.
3. Answer A is incorrect because the auth application-default command is used to obtain user
credentials, not service account credentials. User credentials are not recommended for
applications running on Google Cloud, as they are less secure and less scalable than service
account credentials.
1. Answer B is incorrect because it involves downloading and copying the secret key value of
the service account, which is not a secure or reliable way of managing credentials. The secret
key value should be kept private and not exposed to any other system or user.
2. Moreover, setting the GOOGLE environment variable on the Jenkins server is not a valid way
of providing credentials to Terraform. Terraform expects the credentials to be either in a file

pointed by the GOOGLE_APPLICATION_CREDENTIALS environment variable, or in a provider
block with the credentials argument.
3. Answer D is incorrect because it involves using the Terraform module for Secret Manager,
which is a service that stores and manages sensitive data such as API keys, passwords, and
certificates.
While Secret Manager can be used to store and retrieve credentials, it is not necessary or
sufficient for authorizing the Terraform Jenkins instance. The Terraform Jenkins instance still
needs a service account with the appropriate IAM permissions to access Secret Manager and
other Google Cloud resources.
QUESTION: 161
You are analyzing Java applications in production. All applications have Cloud Profiler and
Cloud Trace installed and configured by default. You want to determine which applications need
performance tuning.
What should you do?
Choose 2 answers
A. Examine the wall-clock time and the CPU time Of the application. If the difference is
substantial, increase the CPU resource allocation.
B. Examine the wall-clock time and the CPU time of the application. If the difference is
substantial, increase the memory resource allocation.
C. 17 Examine the wall-clock time and the CPU time of the application. If the difference is
substantial, increase the local disk storage allocation.
D. O Examine the latency time, the wall-clock time, and the CPU time of the application. If the
latency time is slowly burning down the error budget, and the difference between wall-clock time
and CPU time is minimal, mark the application for optimization.
E. Examine the heap usage Of the application. If the usage is low, mark the application for
optimization.
Answer(s): A, D
Explanation:
The correct answers are A and D.
Examine the wall-clock time and the CPU time of the application. If the difference is substantial,
increase the CPU resource allocation. This is a good way to determine if the application is CPU-
bound, meaning that it spends more time waiting for the CPU than performing actual
computation.
Increasing the CPU resource allocation can improve the performance of CPU-bound
applications. Examine the latency time, the wall-clock time, and the CPU time of the application.
If the latency time is slowly burning down the error budget, and the difference between wall-
clock time and CPU time is minimal, mark the application for optimization. This is a good way to
determine if the application is I/O-bound, meaning that it spends more time waiting for
input/output operations than performing actual computation. Increasing the CPU resource
allocation will not help I/O-bound applications, and they may need optimization to reduce the
number or duration of I/O operations. Answer B is incorrect because increasing the memory
resource allocation will not help if the application is CPU-bound or I/O-bound. Memory allocation
affects how much data the application can store and access in memory, but it does not affect
how fast the application can process that data. Answer C is incorrect because increasing the
local disk storage allocation will not help if the application is CPU-bound or I/O-bound. Disk
storage affects how much data the application can store and access on disk, but it does not
affect how fast the application can process that data. Answer E is incorrect because examining
the heap usage of the application will not help to determine if the application needs performance
tuning. Heap usage affects how much memory the application allocates for dynamic objects, but
it does not affect how fast the application can process those objects. Moreover, low heap usage
does not necessarily mean that the application is inefficient or unoptimized.
QUESTION: 162
You deployed an application into a large Standard Google Kubernetes Engine (GKE) cluster.
The application is stateless and multiple pods run at the same time. Your application receives
inconsistent traffic. You need to ensure that the user experience remains consistent regardless
of changes in traffic. and that the resource usage of the cluster is optimized.
What should you do?
A. Configure a cron job to scale the deployment on a schedule.

B. Configure a Horizontal Pod Autoscaler.
C. Configure a Vertical Pod Autoscaler.
D. Configure cluster autoscaling on the node pool.
Answer(s): B

Professional Cloud DevOps Engineer - en

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Professional Cloud DevOps Engineer - en

Uploaded by

Copyright:

Available Formats

CERTIFICATIONS

We provide the latest IT certification practice exams in a variety of formats and

Product Questions: 162

A. Configure the VPC as a Shared VPC Host project.

export to a central data storage location for analysis.

A. Use Cloud Build to trigger a Spinnaker pipeline.

A. Review current application metrics and add new ones as needed.

using Stackdriver Monitoring for App Engine.

A. Enable Cloud Security Scanner on the clusters.

A. A quality SLI: the ratio of non-degraded responses to total responses

A. A scalable in-memory caching system

A. Use the n1-highcpu-96 machine type in the configuration of the MIG.

A. Use Stackdriver Kubernetes Engine Monitoring.

A. Download and configure a third-party integration between Stackdriver Monitoring and an

What should you do next?

A. Your application servers' logs

A. Reference the image digest in the source control tag.

A. 1. Export VM utilization logs from Stackdriver to BigOuery.

A. · Store your code in a Git-based version control system.

A. Roll back the experimental canary release.

A. Purchase Committed Use Discounts.

A. Call individual stakeholders lo explain what happened.

A. Focus on responding to internal stakeholders at least every 30 minutes. Commit to "next

A. Use the default Stackdriver Kubernetes Engine Monitoring agent configuration.

HTTP requests / total HTTP requests (success rate)

running out memory. You want to improve service availability.

A. Change the specified SLO to match the measured SLI.

A. · Deploy the Stackdriver logging agent to the application servers.

A. Instrument all applications with Stackdriver Profiler.

A. Use Cloud Storage to cache intermediate artifacts.

A. Roll back the recent release.

A. Eliminate unactionable alerts.

A. Load teat the application to profile its performance for scaling.

A. Upgrade the GCS buckets to Multi-Regional.

A. Implement Jenkins on local workstations.

A. 90th percentile - 100ms

What should you do?

A. An explanation of the root cause of the incident

A. Use a partitioned rolling update.

* Description: The proportion of requests that resulted in a successful response.

A. Analyze VPC flow logs along the path of the request.

A. Introduce the new version of the API.

B.Announce deprecation of the old version of the API.

commit from working on production services.

A. Add more serving capacity to all of your application's zones.

A. Make the service's SLO more strict.

delivery performance What should you do?

A. Premium Tier with a global load balancer

A. Recreate deployment and canary testing

A. Decrease the acknowledgment deadline on the subscription

A. Create a single aggregated log sink at the organization level.

D. Create an aggregated Iog sink in the Dev and Prod folders

A. Configure Prometheus cross-service federation for centralized data access

environment. and run acceptance tests

A. Enable Packet Mirroring on the VPC

A. Add more serving capacity to all of your application's zones

A. An external stakeholder asks for a postmortem

A. Cloud Build with Packer

A. Configure Error Reporting in your application

A. Run the kubectl rollout undo command

A. Create a single new scoping project

A. Use the Recommender API and apply the suggested recommendations

A. Add the Logs Writer role to the service account.