You are on page 1of 3

Compute Power for Analytic and ML Workloads | Ayoub Bensakhria

2. Compute Power for Analytics and


ML Workloads

Introduction
Google trains its output machine learning models on its large data center network and then
deploys smaller trained versions of these models to your phone's hardware for video predictions
for example. You can use pre-trained AI building blocks to exploit Google's AI work. For
example, if you're a film trailer producer and want to quickly detect labels and artifacts in
thousands of these film trailers to create a film recommendation system, you may be using a
Cloud Video Intelligence rather of creating and training your own custom model, you might be
using the API.
There are other professionally qualified language and communication models available too. For
Google, running so many sophisticated ML models on broad structured and unstructured
datasets required huge investment in computing power.
Google actually has been doing cloud computing for its own projects for over 10 years and has
now made the computing power accessible to you via Google Cloud.
Historically, these compute problems could be dealt with by Moore's Law. Moore's Law was a
tendency in hardware computing that defines the rate at which computational power doubled.
Computing capacity has been rising so rapidly for years that you should only wait for it to catch
up to the scale of your problem. While computing power has been increasing rapidly, even as
recently as eight years ago, growth has slowed significantly over the past few years as
manufacturers face fundamental physics limits. Performance in computing has reached a
plateau.

1
Compute Power for Analytic and ML Workloads | Ayoub Bensakhria

Figure 1. Moor's Law shows that computational power has reached a plateau (daCosta & Santa, 2014)

ML models for better power management


Recently, new hardware forms have been developed by Google specifically for machine
learning. The Tensor Processing Unit or TPU is an ASIC designed specifically for ML. It has
more memory than conventional CPUs or GPUs, and a faster processor for ML workloads.
Google has been working on the TPU for many years and made it available to other companies
via Google Cloud for very big and difficult problems in machine learning.
Google Engineers have trained a model of machine learning to optimize cooling better than
existing systems could. The model they applied decreased the cooling energy by 40%, and
improved the total power output by 15%. I consider this example particularly inspiring, because
it is a machine learning model trained in a datacenter's machine learning specialized hardware,
informing the datacenter how difficult it is to run the machine learning specialized that the
model is training on. Powerful things at the start-up stage.

Figure 2. The models can predict PUE with 99.6 percent accuracy (Google LLC, 2014). (source : Google
Blog)

2
Compute Power for Analytic and ML Workloads | Ayoub Bensakhria

References
daCosta, F. & Santa, C., 2014. Rethinking the Internet of Things. s.l.:s.n.
Google Cloud LLC, 2020. Google Cloud. [Online]
Available at: https://cloud.google.com
[Accessed 2020].
Google LLC, 2014. Better data centers through machine learning. [Online]
Available at: https://googleblog.blogspot.com/2014/05/better-data-centers-through-
machine.html
[Accessed 2020].
Statista, 2020. Internet of Things (IoT) active device connections installed base worldwide
from 2015 to 2025*. [Online]
Available at: https://www.statista.com/statistics/1101442/iot-number-of-connected-devices-
worldwide/
[Accessed 2020].

You might also like