Professional Documents
Culture Documents
The blog post is ordered by mistake severity. This means the mistakes
where people usually waste the most money come first.
GPU
This blog post assumes that you will use a GPU for deep learning. If you
are building or upgrading your system for deep learning, it is not sensible
to leave out the GPU. The GPU is just the heart of deep learning
applications – the improvement in processing speed is just too huge to
ignore.
Another problem to watch out for, especially if you buy multiple RTX
cards is cooling. If you want to stick GPUs into PCIe slots which are next
to each other you should make sure that you get GPUs with a blower-
style fan. Otherwise you might run into temperature issues and your
GPUs will be slower (about 30%) and die faster.
Suspect
line-up
Can you identify the hardware part which is at fault for bad
performance? One of these GPUs? Or maybe it is the fault of the CPU
after all?
RAM
The main mistakes with RAM is to buy RAM with a too high clock rate.
The second mistake is to buy not enough RAM to have a smooth
prototyping experience.
RAM Size
RAM size does not affect deep learning performance. However, it might
hinder you from executing your GPU code comfortably (without
swapping to disk). You should have enough RAM to comfortable work
with your GPU. This means you should have at least the amount of RAM
that matches your biggest GPU. For example, if you have a Titan RTX
with 24 GB of memory you should have at least 24 GB of RAM. However,
if you have more GPUs you do not necessarily need more RAM.
The problem with this “match largest GPU memory in RAM” strategy is
that you might still fall short of RAM if you are processing large datasets.
The best strategy here is to match your GPU and if you feel that you do
not have enough RAM just buy some more.
CPU
The main mistake that people make is that people pay too much
attention to PCIe lanes of a CPU. You should not care much about PCIe
lanes. Instead, just look up if your CPU and motherboard combination
supports the number of GPUs that you want to run. The second most
common mistake is to get a CPU which is too powerful.
Thus going from 4 to 16 PCIe lanes will give you a performance increase
of roughly 3.2%. However, if you use PyTorch’s data loader with pinned
memory you gain exactly 0% performance. So do not waste your money
on PCIe lanes if you are using a single GPU!
When you select CPU PCIe lanes and motherboard PCIe lanes make
sure that you select a combination which supports the desired number
of GPUs. If you buy a motherboard that supports 2 GPUs, and you want
to have 2 GPUs eventually, make sure that you buy a CPU that supports
2 GPUs, but do not necessarily look at PCIe lanes.
PCIe Lanes and Multi-GPU Parallelism
Are PCIe lanes important if you train networks on multiple GPUs with
data parallelism? I have published a paper on this at ICLR2016, and I can
tell you if you have 96 GPUs then PCIe lanes are really important.
However, if you have 4 or fewer GPUs this does not matter much. If you
parallelize across 2-3 GPUs, I would not care at all about PCIe lanes.
With 4 GPUs, I would make sure that I can get a support of 8 PCIe lanes
per GPU (32 PCIe lanes in total). Since almost nobody runs a system with
more than 4 GPUs as a rule of thumb: Do not spend extra money to get
more PCIe lanes per GPU — it does not matter!
By far the most useful application for your CPU is data preprocessing.
There are two different common data processing strategies which have
different CPU needs.
Loop:
1. Load mini-batch
2. Preprocess mini-batch
3. Train on mini-batch
1. Preprocess data
2. Loop:
1. Load preprocessed mini-batch
2. Train on mini-batch
For the first strategy, a good CPU with many cores can boost
performance significantly. For the second strategy, you do not need a
very good CPU. For the first strategy, I recommend a minimum of 4
threads per GPU — that is usually two cores per GPU. I have not done
hard tests for this, but you should gain about 0-5% additional
performance per additional core/GPU.
While this reasoning seems sensible, there is the fact that the CPU has
100% usage when I run deep learning programs, so what is the issue
here? I did some CPU core rate underclocking experiments to find out.
CPU
underclocking on MNIST and ImageNet: Performance is measured as
time taken on 200 epochs MNIST or a quarter epoch on ImageNet with
different CPU core clock rates, where the maximum clock rate is taken
as a baseline for each CPU. For comparison: Upgrading from a GTX 680
to a GTX Titan is about +15% performance; from GTX Titan to GTX 980
another +20% performance; GPU overclocking yields about +5%
performance for any GPU
Note that these experiments are on a hardware that is dated, however,
these results should still be the same for modern CPUs/GPUs.
Hard drive/SSD
The hard drive is not usually a bottleneck for deep learning. However, if
you do stupid things it will hurt you: If you read your data from disk
when they are needed (blocking wait) then a 100 MB/s hard drive will
cost you about 185 milliseconds for an ImageNet mini-batch of size 32 —
ouch! However, if you asynchronously fetch the data before it is used
(for example torch vision loaders), then you will have loaded the mini-
batch in 185 milliseconds while the compute time for most deep neural
networks on ImageNet is about 200 milliseconds. Thus you will not face
any performance penalty since you load the next mini-batch while the
current is still computing.
You can calculate the required watts by adding up the watt of your CPU
and GPUs with an additional 10% of watts for other components and as
a buffer for power spikes. For example, if you have 4 GPUs with each
250 watts TDP and a CPU with 150 watts TDP, then you will need a PSU
with a minimum of 4×250 + 150 + 100 = 1250 watts. I would usually add
another 10% just to be sure everything works out, which in this case
would result in a total of 1375 Watts. I would round up in this case an get
a 1400 watts PSU.
One important part to be aware of is that even if a PSU has the required
wattage, it might not have enough PCIe 8-pin or 6-pin connectors. Make
sure you have enough connectors on the PSU to support all your GPUs!
Using a couple of GPUs around the clock will significantly increase your
carbon footprint and it will overshadow transportation (mainly airplane)
and other factors that contribute to your footprint. If you want to be
responsible, please consider going carbon neutral like the NYU Machine
Learning for Language Group (ML2) — it is easy to do, cheap, and should
be standard for deep learning researchers.
Modern GPUs will increase their speed – and thus power consumption –
up to their maximum when they run an algorithm, but as soon as the
GPU hits a temperature barrier – often 80 °C – the GPU will decrease
the speed so that the temperature threshold is not breached. This
enables the best performance while keeping your GPU safe from
overheating.
Since NVIDIA GPUs are first and foremost gaming GPUs, they are
optimized for Windows. You can change the fan schedule with a few
clicks in Windows, but not so in Linux, and as most deep learning
libraries are written for Linux this is a problem.
The only option under Linux is to use to set a configuration for your Xorg
server (Ubuntu) where you set the option “coolbits”. This works very
well for a single GPU, but if you have multiple GPUs where some of them
are headless, i.e. they have no monitor attached to them, you have to
emulate a monitor which is hard and hacky. I tried it for a long time and
had frustrating hours with a live boot CD to recover my graphics settings
– I could never get it running properly on headless GPUs.
The most important point of consideration if you run 3-4 GPUs on air
cooling is to pay attention to the fan design. The “blower” fan design
pushes the air out to the back of the case so that fresh, cooler air is
pushed into the GPU. Non-blower fans suck in air in the vincity of the
GPU and cool the GPU. However, if you have multiple GPUs next to each
other then there is no cool air around and GPUs with non-blower fans
will heat up more and more until they throttle themselves down to reach
cooler temperatures. Avoid non-blower fans in 3-4 GPU setups at all
costs.
Conclusion Cooling
So in the end it is simple: For 1 GPU air cooling is best. For multiple
GPUs, you should get blower-style air cooling and accept a tiny
performance penalty (10-15%), or you pay extra for water cooling which
is also more difficult to setup correctly and you have no performance
penalty. Air and water cooling are all reasonable choices in certain
situations. I would however recommend air cooling for simplicity in
general — get a blower-style GPU if you run multiple GPUs. If you want
to user water cooling try to find all-in-one (AIO) water cooling solutions
for GPUs.
Motherboard
Your motherboard should have enough PCIe ports to support the
number of GPUs you want to run (usually limited to four GPUs, even if
you have more PCIe slots); remember that most GPUs have a width of
two PCIe slots, so buy a motherboard that has enough space between
PCIe slots if you intend to use multiple GPUs. Make sure your
motherboard not only has the PCIe slots, but actually supports the GPU
setup that you want to run. You can usually find information in this if you
search your motherboard of choice on newegg and look at PCIe section
on the specification page.
Computer Case
When you select a case, you should make sure that it supports full length
GPUs that sit on top of your motherboard. Most cases support full length
GPUs, but you should be suspicious if you buy a small case. Check its
dimensions and specifications; you can also try a google image search of
that model and see if you find pictures with GPUs in them.
If you use custom water cooling, make sure your case has enough space
for the radiators. This is especially true if you use water cooling for your
GPUs. The radiator of each GPU will need some space — make sure your
setup actually fits into the GPU.
Monitors
I first thought it would be silly to write about monitors also, but they make
such a huge difference and are so important that I just have to write about
them.
Typical
monitor layout when I do deep learning: Left: Papers, Google searches,
gmail, stackoverflow; middle: Code; right: Output windows, R, folders,
systems monitors, GPU monitors, to-do list, and other small applications.
Some words on building a PC
Many people are scared to build computers. The hardware components
are expensive and you do not want to do something wrong. But it is really
simple as components that do not belong together do not fit together.
The motherboard manual is often very specific how to assemble
everything and there are tons of guides and step by step videos which
guide you through the process if you have no experience.
The great thing about building a computer is, that you know everything
that there is to know about building a computer when you did it once,
because all computer are built in the very same way – so building a
computer will become a life skill that you will be able to apply again and
again. So no reason to hold back!
Conclusion / TL;DR
GPU: RTX 2070 or RTX 2080 Ti. GTX 1070, GTX 1080, GTX 1070 Ti, and
GTX 1080 Ti from eBay are good too!
CPU: 1-2 cores per GPU depending how you preprocess data. > 2GHz;
CPU should support the number of GPUs that you want to run. PCIe
lanes do not matter.
RAM:
– Clock rates do not matter — buy the cheapest RAM.
– Buy at least as much CPU RAM to match the RAM of your largest GPU.
– Buy more RAM only when needed.
– More RAM can be useful if you frequently work with large datasets.
Hard drive/SSD:
– Hard drive for data (>= 3TB)
– Use SSD for comfort and preprocessing small datasets.
PSU:
– Add up watts of GPUs + CPU. Then multiply the total by 110% for
required Wattage.
– Get a high efficiency rating if you use a multiple GPUs.
– Make sure the PSU has enough PCIe connectors (6+8pins)
Cooling:
– CPU: get standard CPU cooler or all-in-one (AIO) water cooling
solution
– GPU:
– Use air cooling
– Get GPUs with “blower-style” fans if you buy multiple GPUs
– Set coolbits flag in your Xorg config to control fan speeds
Motherboard:
– Get as many PCIe slots as you need for your (future) GPUs (one GPU
takes two slots; max 4 GPUs per system)
Monitors:
– An additional monitor might make you more productive than an
additional GPU.