You are on page 1of 3

Google Cloud went offline taking

with it YouTube, Snapchat,


Gmail, and a number of other web
services
By Sugandha Lahoti -June 3, 2019 - 5:04 am

Update: The article has been updated to include Google’s response on Sunday’s disruption
service.

Over the weekend, Google Cloud suffered a major outage taking down a number of Google
services, YouTube, GSuite, Gmail, etc. It also affected services dependent on Google such
as Snapchat, Nest, Discord, Shopify and more. The problem was first reported by East
Coast users in the U.S around 3 PM ET / 12 PM PT, and the company resolved them after
more than four hours. According to downdetector, UK, France, Austria, Spain, Brazil, also
reported they are suffering from the outage.

In a statement posted to its Google Cloud Platform the company said it experiencing a
multi-region issue with the Google Compute Engine. “We are experiencing high levels of
network congestion in the eastern USA, affecting multiple services in Google Cloud, GSuite,
and YouTube. Users may see a slow performance or intermittent errors. We believe we
have identified the root cause of the congestion and expect to return to normal service
shortly,” the company said in a statement.

The issue was sorted four hours after Google acknowledged the downtime. “The network
congestion issue in the eastern USA, affecting Google Cloud, G Suite, and YouTube has
been resolved for all affected users as of 4:00 pm US/Pacific,” the company said in a
statement.
“We will conduct an internal investigation of this issue and make appropriate improvements
to our systems to help prevent or minimize future recurrence. We will provide a detailed
report of this incident once we have completed our internal investigation. This detailed
report will contain information regarding SLA credits.”

This outage resulted in some major suffering. Not only did it impact one of the most used
apps by Netziens (YouTube and Sanpchat), people also reported that they were unable to
use their NEST controlled devices such as turn on their AC or open their “smart” locks to let
people into the house.

Even Shopify experienced problems because of the Google outage, which prevented some
stores (both brick-and-mortar and online) from processing credit card payments for hours.

Due to @googlecloud outage, @Shopify has been down all

afternoon. Shops running on the platform may have collectively lost

$millions already, due to lost sales and ad-spend.

pic.twitter.com/NUwZg3lMDA

— Larry Weru (@LarryWeru) June 2, 2019

The entire dependency of the world’s most popular applications on just one backend in the
hands of one company seems a bit startling. It is also surprising how so many people just
rely on one hosting service. At the very least, companies should think of setting up a
contingency plan, in case the services go down again.

Another issue which popped up was how Google cloud randomly being down is proof that
cloud-based gaming isn’t ready for mass audiences yet. At this year’s Game Developers
Conference (GDC), Google marked its entry in the game industry with Stadia, its new
cloud-based platform for streaming games. It will be launching later this year in select
countries including the U.S., Canada, U.K., and Europe.

In essence, the root cause of Sunday’s disruption was a configuration change that was
intended for a small number of servers in a single region. The configuration was incorrectly
applied to a larger number of servers across several neighboring regions, and it caused
those regions to stop using more than half of their available network capacity. The network
traffic to/from those regions then tried to fit into the remaining network capacity, but it did
not. The network became congested, and our networking systems correctly triaged the
traffic overload and dropped larger, less latency-sensitive traffic in order to preserve smaller
latency-sensitive traffic flows, much as urgent packages may be couriered by bicycle
through even the worst traffic jam.

Next, Google’s engineering teams are conducting a thorough post-mortem to understand all
the contributing factors to both the network capacity loss and the slow restoration.

You might also like