Professional Documents
Culture Documents
A Full Hardware Guide To Deep Learning - Tim Dettmers
A Full Hardware Guide To Deep Learning - Tim Dettmers
Over the years, I build a total of 7 different deep learning workstations and despite
careful research and reasoning, I made my fair share of mistake in selecting hardware
parts. In this guide, I want to share my experience that I gained over the years so that
you do not make the same mistakes that I did before.
The blog post is ordered by mistake severity. This means the mistakes where people
usually waste the most money come first.
Contents hide
GPU
RAM
Needed RAM Clock Rate
RAM Size
CPU
CPU and PCI-Express
PCIe Lanes and Multi-GPU Parallelism
Needed CPU Cores
Needed CPU Clock Rate (Frequency)
Hard drive/SSD
Power supply unit (PSU)
CPU and GPU Cooling
Air Cooling GPUs
Water Cooling GPUs For Multiple GPUs
A Big Case for Cooling?
Conclusion Cooling
Motherboard
Computer Case
Monitors
Some words on building a PC
Conclusion / TL;DR
Related
Related Posts
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 1/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
GPU
This blog post assumes that you will use a GPU for deep learning. If you are building
or upgrading your system for deep learning, it is not sensible to leave out the GPU.
The GPU is just the heart of deep learning applications – the improvement in
processing speed is just too huge to ignore.
I talked at length about GPU choice in my GPU recommendations blog post, and the
choice of your GPU is probably the most critical choice for your deep learning system.
There are three main mistakes that you can make when choosing a GPU: (1) bad
cost/performance, (2) not enough memory, (3) poor cooling.
For good cost/performance, I generally recommend an RTX 2070 or an RTX 2080 Ti. If
you use these cards you should use 16-bit models. Otherwise, GTX 1070, GTX 1080,
GTX 1070 Ti, and GTX 1080 Ti from eBay are fair choices and you can use these GPUs
with 32-bit (but not 16-bit).
Be careful about the memory requirements when you pick your GPU. RTX cards, which
can run in 16-bits, can train models which are twice as big with the same memory
compared to GTX cards. As such RTX cards have a memory advantage and picking
RTX cards and learn how to use 16-bit models effectively will carry you a long way. In
general, the requirements for memory are roughly the following:
Another problem to watch out for, especially if you buy multiple RTX cards is cooling. If
you want to stick GPUs into PCIe slots which are next to each other you should make
sure that you get GPUs with a blower-style fan. Otherwise you might run into
temperature issues and your GPUs will be slower (about 30%) and die faster.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 2/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Suspect line-up
Can you identify the hardware part which is at fault for bad
performance? One of these GPUs? Or maybe it is the fault of
the CPU after all?
RAM
The main mistakes with RAM is to buy RAM with a too high clock rate. The second
mistake is to buy not enough RAM to have a smooth prototyping experience.
Furthermore, it is important to know that RAM speed is pretty much irrelevant for fast
CPU RAM->GPU RAM transfers. This is so because (1) if you used pinned memory,
your mini-batches will be transferred to the GPU without involvement from the CPU,
and (2) if you do not use pinned memory the performance gains of fast vs slow
RAMs is about 0-3% — spend your money elsewhere!
RAM Size
RAM size does not affect deep learning performance. However, it might hinder you
from executing your GPU code comfortably (without swapping to disk). You should
have enough RAM to comfortable work with your GPU. This means you should have at
least the amount of RAM that matches your biggest GPU. For example, if you have a
Titan RTX with 24 GB of memory you should have at least 24 GB of RAM. However, if
you have more GPUs you do not necessarily need more RAM.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 3/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
The problem with this “match largest GPU memory in RAM” strategy is that you might
still fall short of RAM if you are processing large datasets. The best strategy here is to
match your GPU and if you feel that you do not have enough RAM just buy some
more.
CPU
The main mistake that people make is that people pay too much attention to PCIe
lanes of a CPU. You should not care much about PCIe lanes. Instead, just look up if
your CPU and motherboard combination supports the number of GPUs that you want
to run. The second most common mistake is to get a CPU which is too powerful.
Putting this together we have for an ImageNet mini-batch of 32 images and a ResNet-
152 the following timing:
Thus going from 4 to 16 PCIe lanes will give you a performance increase of roughly
3.2%. However, if you use PyTorch’s data loader with pinned memory you gain exactly
0% performance. So do not waste your money on PCIe lanes if you are using a single
GPU!
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 4/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
When you select CPU PCIe lanes and motherboard PCIe lanes make sure that you
select a combination which supports the desired number of GPUs. If you buy a
motherboard that supports 2 GPUs, and you want to have 2 GPUs eventually, make
sure that you buy a CPU that supports 2 GPUs, but do not necessarily look at PCIe
lanes.
By far the most useful application for your CPU is data preprocessing. There are two
different common data processing strategies which have different CPU needs.
Loop:
1. Load mini-batch
2. Preprocess mini-batch
3. Train on mini-batch
1. Preprocess data
2. Loop:
1. Load preprocessed mini-batch
2. Train on mini-batch
For the first strategy, a good CPU with many cores can boost performance significantly.
For the second strategy, you do not need a very good CPU. For the first strategy, I
recommend a minimum of 4 threads per GPU — that is usually two cores per GPU. I
have not done hard tests for this, but you should gain about 0-5% additional
performance per additional core/GPU.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 5/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
For the second strategy, I recommend a minimum of 2 threads per GPU — that is
usually one core per GPU. You will not see significant gains in performance when you
have more cores if you are using the second strategy.
In the case of deep learning there is very little computation to be done by the CPU:
Increase a few variables here, evaluate some Boolean expression there, make some
function calls on the GPU or within the program – all these depend on the CPU core
clock rate.
While this reasoning seems sensible, there is the fact that the CPU has 100% usage
when I run deep learning programs, so what is the issue here? I did some CPU core
rate underclocking experiments to find out.
Note that these experiments are on a hardware that is dated, however, these results
should still be the same for modern CPUs/GPUs.
Hard drive/SSD
The hard drive is not usually a bottleneck for deep learning. However, if you do stupid
things it will hurt you: If you read your data from disk when they are needed (blocking
wait) then a 100 MB/s hard drive will cost you about 185 milliseconds for an ImageNet
mini-batch of size 32 — ouch! However, if you asynchronously fetch the data before it
is used (for example torch vision loaders), then you will have loaded the mini-batch in
185 milliseconds while the compute time for most deep neural networks on ImageNet
is about 200 milliseconds. Thus you will not face any performance penalty since you
load the next mini-batch while the current is still computing.
However, I recommend an SSD for comfort and productivity: Programs start and
respond more quickly, and pre-processing with large files is quite a bit faster. If you buy
an NVMe SSD you will have an even smoother experience when compared to a
regular SSD.
Thus the ideal setup is to have a large and slow hard drive for datasets and an SSD for
productivity and comfort.
You can calculate the required watts by adding up the watt of your CPU and GPUs with
an additional 10% of watts for other components and as a buffer for power spikes. For
example, if you have 4 GPUs with each 250 watts TDP and a CPU with 150 watts TDP,
then you will need a PSU with a minimum of 4×250 + 150 + 100 = 1250 watts. I would
usually add another 10% just to be sure everything works out, which in this case would
result in a total of 1375 Watts. I would round up in this case an get a 1400 watts PSU.
One important part to be aware of is that even if a PSU has the required wattage, it
might not have enough PCIe 8-pin or 6-pin connectors. Make sure you have enough
connectors on the PSU to support all your GPUs!
Another important thing is to buy a PSU with high power efficiency rating – especially
if you run many GPUs and will run them for a longer time.
Running a 4 GPU system on full power (1000-1500 watts) to train a convolutional net
for two weeks will amount to 300-500 kWh, which in Germany – with rather high
power costs of 20 cents per kWh – will amount to 60-100€ ($66-111). If this price is for a
100% efficiency, then training such a net with an 80% power supply would increase the
costs by an additional 18-26€ – ouch! This is much less for a single GPU, but the point
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 7/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
still holds – spending a bit more money on an efficient power supply makes good
sense.
Using a couple of GPUs around the clock will significantly increase your carbon
footprint and it will overshadow transportation (mainly airplane) and other factors that
contribute to your footprint. If you want to be responsible, please consider
going carbon neutral like the NYU Machine Learning for Language Group (ML2) — it is
easy to do, cheap, and should be standard for deep learning researchers.
Modern GPUs will increase their speed – and thus power consumption – up to their
maximum when they run an algorithm, but as soon as the GPU hits a temperature
barrier – often 80 °C – the GPU will decrease the speed so that the temperature
threshold is not breached. This enables the best performance while keeping your GPU
safe from overheating.
However, typical pre-programmed schedules for fan speeds are badly designed for
deep learning programs, so that this temperature threshold is reached within seconds
after starting a deep learning program. The result is a decreased performance (0-10%)
which can be significant for multiple GPUs (10-25%) where the GPU heat up each
other.
Since NVIDIA GPUs are first and foremost gaming GPUs, they are optimized for
Windows. You can change the fan schedule with a few clicks in Windows, but not so in
Linux, and as most deep learning libraries are written for Linux this is a problem.
The only option under Linux is to use to set a configuration for your Xorg server
(Ubuntu) where you set the option “coolbits”. This works very well for a single GPU, but
if you have multiple GPUs where some of them are headless, i.e. they have no monitor
attached to them, you have to emulate a monitor which is hard and hacky. I tried it for
a long time and had frustrating hours with a live boot CD to recover my graphics
settings – I could never get it running properly on headless GPUs.
The most important point of consideration if you run 3-4 GPUs on air cooling is to pay
attention to the fan design. The “blower” fan design pushes the air out to the back of
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 8/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
the case so that fresh, cooler air is pushed into the GPU. Non-blower fans suck in air in
the vincity of the GPU and cool the GPU. However, if you have multiple GPUs next to
each other then there is no cool air around and GPUs with non-blower fans will heat
up more and more until they throttle themselves down to reach cooler temperatures.
Avoid non-blower fans in 3-4 GPU setups at all costs.
Conclusion Cooling
So in the end it is simple: For 1 GPU air cooling is best. For multiple GPUs, you should
get blower-style air cooling and accept a tiny performance penalty (10-15%), or you
pay extra for water cooling which is also more difficult to setup correctly and you have
no performance penalty. Air and water cooling are all reasonable choices in certain
situations. I would however recommend air cooling for simplicity in general — get a
blower-style GPU if you run multiple GPUs. If you want to user water cooling try to find
all-in-one (AIO) water cooling solutions for GPUs.
Motherboard
Your motherboard should have enough PCIe ports to support the number of GPUs
you want to run (usually limited to four GPUs, even if you have more PCIe slots);
remember that most GPUs have a width of two PCIe slots, so buy a motherboard that
has enough space between PCIe slots if you intend to use multiple GPUs. Make sure
your motherboard not only has the PCIe slots, but actually supports the GPU setup
that you want to run. You can usually find information in this if you search your
motherboard of choice on newegg and look at PCIe section on the specification page.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 9/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Computer Case
When you select a case, you should make sure that it supports full length GPUs that sit
on top of your motherboard. Most cases support full length GPUs, but you should be
suspicious if you buy a small case. Check its dimensions and specifications; you can
also try a google image search of that model and see if you find pictures with GPUs in
them.
If you use custom water cooling, make sure your case has enough space for the
radiators. This is especially true if you use water cooling for your GPUs. The radiator of
each GPU will need some space — make sure your setup actually fits into the GPU.
Monitors
I first thought it would be silly to write about monitors also, but they make such a huge
difference and are so important that I just have to write about them.
The money I spent on my 3 27 inch monitors is probably the best money I have ever
spent. Productivity goes up by a lot when using multiple monitors. I feel desperately
crippled if I have to work with a single monitor. Do not short-change yourself on this
matter. What good is a fast deep learning system if you are not able to operate it in an
efficient manner?
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 10/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Many people are scared to build computers. The hardware components are expensive
and you do not want to do something wrong. But it is really
simple as components that do not belong together do not fit together. The
motherboard manual is often very specific how to assemble everything and there are
tons of guides and step by step videos which guide you through the process if you
have no experience.
The great thing about building a computer is, that you know everything that there is to
know about building a computer when you did it once, because all computer are built
in the very same way – so building a computer will become a life skill that you will be
able to apply again and again. So no reason to hold back!
Conclusion / TL;DR
GPU: RTX 2070 or RTX 2080 Ti. GTX 1070, GTX 1080, GTX 1070 Ti, and GTX 1080 Ti
from eBay are good too!
CPU: 1-2 cores per GPU depending how you preprocess data. > 2GHz; CPU should
support the number of GPUs that you want to run. PCIe lanes do not matter.
RAM:
– Clock rates do not matter — buy the cheapest RAM.
– Buy at least as much CPU RAM to match the RAM of your largest GPU.
– Buy more RAM only when needed.
– More RAM can be useful if you frequently work with large datasets.
Hard drive/SSD:
– Hard drive for data (>= 3TB)
– Use SSD for comfort and preprocessing small datasets.
PSU:
– Add up watts of GPUs + CPU. Then multiply the total by 110% for required Wattage.
– Get a high efficiency rating if you use a multiple GPUs.
– Make sure the PSU has enough PCIe connectors (6+8pins)
Cooling:
– CPU: get standard CPU cooler or all-in-one (AIO) water cooling solution
– GPU:
– Use air cooling
– Get GPUs with “blower-style” fans if you buy multiple GPUs
– Set coolbits flag in your Xorg config to control fan speeds
Motherboard:
– Get as many PCIe slots as you need for your (future) GPUs (one GPU takes two slots;
max 4 GPUs per system)
Monitors:
– An additional monitor might make you more productive than an additional GPU.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 11/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Related
Related Posts
Which
Which GPU(s)
GPU(s) to
to
Get for Deep
Get for Deep
How
How to
to Choose
Choose Learning:
Learning: My
My LLM.int8()
LLM.int8() and
and
Your
Your Grad
Grad School
School Experience
Experience and…
and… Emergent
Emergent Features
Features
Comments
Jay says
2021-07-29 at 20:31
Hey,
Thanks for that summary. You said that one should buy a GPU with at least 8GB
RAM but that RTX GPU RAM was twice as effective as GTX RAM. That brings me to
my question.
I have a choice between 2 laptops. Identical except one has an GeForce RTX 3060
6GB and costs $1400; while the other has a GeForce RTX 3070 8GB and costs
$2000.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 12/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I know the RTX 3060 will be slower but is 6GB acceptable? You implied it will be the
equivalent of a GeForce GTX 12GB RAM video card for RAM utilization.
Please advise as I’d really like to save the extra $600 in cost between the 2 laptops.
Given that video card add-ins for desktops for 3000 series RTX cards seem to start
at $1000 it seems to me I should bide my time with a good entry level laptop with
an RTX GPU that has much fairer prices until the video card price gouging is done
for.
Thanks!
Reply
Reply
zoey79 says
2021-06-09 at 09:16
Wonderful article. However, I am about to buy a new laptop. So what do you feel
about the idea of a gaming laptop for deep learning?
Reply
Gaming laptops are excellent for deep learning. Make sure to get a beefy GPU!
Reply
TK says
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 13/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2021-10-24 at 18:22
I had a gaming laptop for deep learning. However I think desktop is still a
better choice . Using Laptop for deep learning tend to overheat the laptop and
battery appears to degrade much faster.
Moreover, the largest gpu memory in a laptop is 8gb but note that not all 8gb
can be allocated for deep learning, which may not be sufficient if you are trying
a very deep network or dual network. Mobile gpu is also less efficient than
desktop gpu. Computing speed (cpu and etc) can also slower than a gaming
desktop.
Reply
Chaitanya says
2021-02-01 at 23:15
Thank you Tim for the post, it was very helpful to understand the importance of
hardware components in deep learning.
I have been researching about the hardware requirements to begin a Deep learning
project on my work station from couple of months, finally read your article that has
answered lot of my questions. I did realize the GPU on my machine will not be
sufficient so wanted to get your thoughts on its replacement or adding a second
one.
Please suggest if I can add any Nvidia 20xx series GPU to below configuration.
Reply
Kriskr3 says
2021-02-01 at 15:12
Hello Tim,
I had read your great article on GPU recommendations for Deep learning, it was
informative and would help anyone who is interested and serious about this field. I
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 14/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
found the article when I searched in google for ideas on GPU upgrade, after
reading your responses to the posts I wanted ask my question right here. I have HP
workstation that has Nvidia Geforce GTX 1050 (4GB) so looking to either replace it
or add another. Power unit is 800 watt, dual CPU, two PCIe GEN 3 X16, one PCI e
GEN3 X8 and three other GEN2. I believe at max I can add one GPU (may be low
wattage) due to space and power limitation. I’m not sure if I can even add Nvidia
Geforce 20X series to the existing or I need to replace. I would appreciate if you can
share view based on your experience.
Reply
Imahn says
2021-01-26 at 13:07
Dear Tim,
I would have five short questions, I am really sorry!
(i) I am generally wondering: I am not 100% sure yet whether I should opt for 2
GPUs or 4GPUs. I would of course first buy 1 GPU and then scale, but if I know a
priori that I plan to have only two GPUs, I could opt for a cheaper MB, CPU, cooler,
PSU, etc. Does one maybe need 2 GPUs to do some testing on hyperparameters of
papers that one reads, and 4 GPUs if one wants to build own neural networks (and
thus test even more ideas)? Do you have any brief thoughts on this, or a link we
could read?
I am asking because in the other post of yours, you were writing about the problem
of
4 x RTX 3090, but wouldn’t this PSU solve the problem? But you didn’t mention this
PSU, that’s why I am confused. (Apparently, this PSU only works under 220 V, so for
me, I couldn’t buy it, but wouldn’t it be great for US Americans?)
(iii) For a possible 4-GPU setup, do you think that an Intel Core i7-9800X with 8
cores is enough for the 4 GPUs at full utilization + the normal things that one does
(reading papers, having Zoom meetings, using LibreOffice, VirtualBox, etc.)? This
CPU would cost me 480 $, but more cores would even cost more. I generally
suspect that I will need a GPU rather than CPU for ML, I know you recommend 1-2
CPUs per GPU, but with 4 GPUs, that would be 4-8 just for the GPUs, so I am
honestly unsure.
(iv) This question is strongly related to (iii): Is it possible to use the Deep Learning
PC for normal home-office while the 4 GPUs are at full utilization?
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 15/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
(v) If I opted for 4 GPUs in blower-style fan, wouldn’t my neighbors be able to hear
it? They have small babies and I am honestly worried that the noise at night would
be too much… Any thoughts would be appreciated.
Reply
Dmytro says
2021-01-21 at 02:44
Hi Tom!
I got CPU : Intel® Pentium(R) CPU G4560 @ 3.50GHz × 4
And got error when try to load model with TF 2.2 and upper
Process finished with exit code 132 (interrupted by signal 4: SIGILL)
when i got TF 1.5 it work fine`s
I read much and find that it connected with CPU , is that true?
I really need understand with what trouble is it
Thank for you`r attention!
Have a nice day.
Reply
marco says
2021-04-01 at 07:23
Probably tf 1.5 could be run on your CPU, but the new was not compiled for
that or was not compatible with your Python version.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 16/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I yet compiled tf on my CPU several time and you can do it, don’t worry
Reply
Armand says
2021-01-11 at 11:43
Hi Tim,
I’m building a DL rig for a student organization and I’m wondering how to share it
with students. I want to be able to create VMs and erase/reconfigure them if
students mess up. I want to use it kind of like a personal AWS Cloud.
Do you have any leads I should follow or keywords I should search for ?
Thanks!
Reply
Mira says
2021-01-11 at 02:37
Hi Tim, all,
We are about to buy (when available) RTX 3090 for AI, PyTorch and TensorFlow.
The computer, where I planned to put GPU in has i7-3930K which runs only at pcie
2.0. How much would pcie 2.0 limit the perfomance in the computations?
I know the theoretical throughputs, but I have no idea about real perfomance.
Could you please give me some example of power deprecation?
Thanks, Mira
Reply
Mira says
2021-01-18 at 03:30
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 17/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Audi says
2021-01-07 at 09:29
Hi Tim,
The question:
1. For the CPU I am conflicted with Ryzen 9 3900XT (12 cores) where people
claimed that each core performance is better than Ryzen Threadripper 2950X (16
cores). For deep learning and ML, which one is better?
2. For the GPU i am also conflicted with going for either MSI RTX 3070 (8gb
memory config, 256 memory bus, and 5888 core ) or Zotac RTX 3080. Your post
recommended to go for RTX 3080 (10gb memory config, 320 memory bus, and
8704 cores) ; however, with my budget, I can only land either of these two where
the RTX 3080 Zotac is claimed to be subpar brand. or maybe should I wait for the
upcoming RTX 3070 Ti (10gb memory config, 320 memory bus and ~7424 cores)
or RTX 3060 non-Ti version(12gb memory config, 192 memory bus, and ~3840
cores) ?
3. As a windows OS user for a long time, should I make it dual OS windows and
Linux or just install Linux for this pc? (I am not getting used to Linux performance
and would like to use my pc for program like office and some steam games)
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 18/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Thanks!
Reply
David says
2021-03-06 at 09:17
Hey Audi,
I’m currently struggling with a similar problem. Either the Ryzen 9 3900x or the
5800x. Do you know which one is better for deep learning? Following the
explanation given by tim I suppose that the 12 cores outperforms the 8 cores of
the 5800x?
Reply
Reply
Imahn says
2021-01-02 at 08:35
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 19/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim,
(i) I hope you are doing fine! I am currently searching a bit for an appropriate
motherboard to support 2-3 GPU’s, and I honestly don’t know how to read the
specifications to decide …
So the way I understood some of your comments, it is not necessary to have PCIe
4.0 lanes, but PCIe 3.0 lanes seem to do their job for Machine Learning. Now let’s
say that I want to have three GPU’s, what does the specification need to say? Since I
cannot find RTX 3080 in blower-style fans, I suspect that I need enough space
between the GPU’s as to not run into cooling problems.
Is this good for 3 GPU’s? To me, on the image, the 3 x PCIe 3.0 x1 look really small,
so I guess only the 3.0 x 16 could be used for GPU’s.
(iii) The Motherboard would come with SATA-cables, would I need more cables to
connect the Motherboard to the PSU or the GPU’s?
Thanks!
Reply
Hi Imahn,
what you want to look out for is a motherboard for specifications that say X
16/X 16/X 16 all the same with the eight instead. This indicates that the
motherboard supports three GPUs in each GPU has 16 lanes. This is different
from how many PCIe slots you have. The easiest way to check a motherboard
is to go to Newegg.com as it has the most information on hardware and the
information a standardized. Seems for your motherboard only supports one
GPU.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 20/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Lauren says
2020-12-13 at 14:52
Thanks so much for your post!! I’m trying to build my first personal setup for deep
learning. I’m hoping to start with one RTX 3090, but then have space for up to
three if I wanted to expand in the future. Do you have any advice on this setup:
CPU: Intel Core i9-10850K Comet Lake 10-Core 3.6 GHz LGA 1200 125W
GPU: RTX 3090
MB: ASUS WS X299 SAGE LGA 2066 Intel X299
RAM: 128GB: Corsair LPX 8*16GB 3200
PS: CORSAIR AX1600i 1600W
Case: Fractal Design Define 7 XL
SSD: HP EX920 M.2 1TB PCIe NVMe NAND SSD
HD: Western Digital 4TB
My plan was to start with 1 GPU, and allow room to expand. Looks like this setup
should support up to three with room around each GPU for cooling? Do you think
I’d need an additional cooler? Any other advice or suggestions on this setup?
Reply
Hi Lauren, the build looks fine to me. Make sure that you can fit all potential
three RTX 3090 in the case that you chose.
Reply
Hi Tim,
Thanks for this article, this is super helpful for a first time builder like me. I had a few
questions:
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 21/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
1) Do you think intel i9-10900F will be enough cores (it has 10 cores) for a dual RTX
3090 build? I know you recommend min 2 cores/GPU, but I asked Lambda Labs
and they recommended min 12 cores for dual RTX 3090 and so I got worried.
2) Also, I realize dual RTX 3090 build is probably impractical with this mobo. In that
case, do you think a RTX 3090 and RTX 3080 Ti (hopefully it comes out) would work
well with this setup?
Thanks!
Reply
BTW, I plan to get more system memory too once I get a 2nd GPU. I was
thinking 32 GB to start with 1 GPU, then buying more once I get my 2nd GPU.
Thanks!
Reply
Hi Brandon,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 22/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
John B says
2021-01-19 at 19:23
Hello Brandon,
I am lookinh to build my first set up for DL and willl like to know how you
choice of parts are helping you so far?
Reply
TOBIN says
2020-11-28 at 06:30
Hello Tim,
Thanks for the blog. I read the blog fully and I am building a deep learning
machine and I would like to have you expertise in building a perfect machine for
my purpose.
I am a start-up, I am building this machine for my start-up which is used for people
tracking using multi camera setup, and deep learning classification based on their
actions.
I intended to use the multi camera people tracking and deep learning classification
on the GPU and use the output.
Once I got this project on the woking phase, I will use this machine as a backend
server for a small group of people around 20 to 30 tracking and classification.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 23/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
RAM: 1x CORSAIR VENGEANCE LPX 32GB (16GBX2) DDR4 DRAM 3000MHZ C16
MEMORY KIT (~ 140$)
Hello Tobin,
this looks good. One thing though if you want to use the server as a backend
for 20 to 30 models, it might be more manageable if you have more multiple
smaller GPUs to spread the load. I am not sure how much memory you will
need and what your budget is, but either multiple RTX 2070, RTX 2080 Ti, RTX
3070, or RTX 3080 Ti might be a better choice than a single RTX 3090.
Reply
John B says
2021-01-19 at 19:24
Hello Tobin,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 24/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I am watching this space to get an idea on how to build my first set up for DL
and willl like to know how your choice of parts are helping you so far?
Reply
Dominik P says
2020-11-25 at 16:44
Hi Tim,
Thanks for the great article! I’m a first-time PC builder, trying to build an ML
Workstation on a 2000-3000€ budget. This is my current plan:
Reply
The question would be: Why you get the PCIe 4.0 motherboard in the first
place? For gaming, you might get some advantages, but not really for anything
deep learning related. So if you want to build a pure deep learning machine, I
would maybe buy a cheaper motherboard. On the other hand, if you later get
NVMe SSD which support full PCIe 4.0 speeds and a second GPU it might be
worth it if you run some things which are very storage-intensive, such as deep
learning with very large datasets. Otherwise, the build looks good!
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 25/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Emmanuel says
2020-11-24 at 06:10
Hi Tim,
Kind regards
Emmanuel
Reply
Some deep learning models are so large that you cannot run them with an 11
GB memory (you might be able to do so with some complicated tricks). These
models are usually some big transformer models. If you run only computer
vision, you can come quite far with 8-10GB but your networks might be a bit
slow because you need to run them with a very small batch size.
Reply
Maciek says
2020-11-12 at 00:44
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 26/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi,
Thank for the post.
Did you try using windows with WSL2 for DL ? This could solve some of your
problems (and create new ones)
Reply
As I understand it, GPUs and WSL2 do not have an easy time to work together.
However, both PyTorch and Windows have pretty good regular Windows
support as I understand. So one does not need to use WSL2.
Reply
I was wondering if you had any thoughts on an 8 GPU setup with dual root
architecture (4 GPUs attached to each CPU). My main focus is distributed training
across all 8 GPUs, but have concerns that the CPU/CPU interconnect may become
a bottleneck for communication between GPUs as some other sources have
suggested that dual root is a big no-no when trying to scale across 8 GPUs (for this
exact reason). However, the cards I am looking to get do not support P2P (2080 ti)
and will have to send data via the CPU anyway (for the cards attached to the same
CPU) so was wondering if you had experience on how problematic that extra hop
across CPUs will be for the cards that are not connected to the same CPU.
Many thanks
Reply
Usually, it is not that big of an issue and parallelization is still quite fast. If you
use the right software (integrated into most libraries) then the GPU memory will
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 27/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Bob O says
2020-10-07 at 11:40
Hey Tim,
I finally bit the bullet and built a machine (with the exception of the scarce 3070 that
will go in later). You mentioned a software post, but I could not find it. Do you have
favorites that use would suggest for doing my software setup. I was going to start
with Ubuntu because it does not appear I can access the GPU with the VMs I have
from windows.
I am looking for specific things like openCV in python, caffe, keras, enabling and
using my GPU… basically exactly what you have done for hardware, but the step
following assembly!
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 28/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hey Bob,
unfortunately, I do not have a software guide. I would recommend Ubuntu
since using GPUs through a VM can be a pain (or not work at all, depending on
the motherboard). In terms of software, I would look into Anaconda3 on
Ubuntu which is a package manager for scientific computing. You can
download it freely and can install all the software that you mention without the
need for compiling anything. Compiling OpenCV for example can be a pain
whereas in anaconda you just execute “conda install -c anaconda opencv” and
you are done.
Good luck!
Reply
Hi Tim, thank you for sharing all your work with these hardware guides!
I don’t know if you have an update for this article somewhere else but there are
now several ways to control the fan curves for NVidia GPUs. The easiest way I’ve
found is to use GreenWithEnvy.
https://gitlab.com/leinardi/gwe
Reply
Thank you, Frank, I have not seen it before! This looks excellent, thank you for
sharing! Another package I know about is coolgpus which is designed for
servers where some of the NVIDIA options are not available because no
monitors are connected to the GPUs. So coolgpus is pretty good for servers,
but the gwe package looks a bit better than coolgpus for the desktop case.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 29/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Matt says
2020-10-03 at 11:25
Hey Tim!
Thank for this post – really helpful! For my first PC build, I’m planning on using a
Ryzen 7 3700x CPU and RTX 2080 Super (will replace in the future with the new
GPUs). You talked about GPU cooling but what is your opinion on CPU cooling? Is
the stock cooler for my CPU not good enough and should I consider AIO solutions?
Thanks!
Reply
Hey Matt! Often, a stock cooler is okay for the CPU although it can be a bit
loud. Many people are not installing AIO solutions on their CPU for better and
more silent cooling. However, it was shown that a good air cooler is often just
as good and even more silent than AIO water cooling solutions. The bottom
line for deep learning though is mostly noise: If a bit of noise is okay, then go
with stock, if you want a more silent setup go with either AIO or a good air
cooler. In either case, if you train large models that saturate your GPU, your
GPU will also be quite loud, so in that case, a silent CPU cooler will not make
the greatest difference. I personally prefer as silent working environments as
possible, and I always buy a dedicated CPU cooler.
Reply
Hi Tim:
Awesome content! I’m a retired software engineer looking to learn more about AI &
ML.
I have a few questions about H/W:
– Intel or AMD (I’m leaning towards AMD using an X570 MB)
– Best starter OS
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 30/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
and S/W:
– Best courseware
– Best learning samples.
Cheers…Bob
Thanks for sharing your knowledge.
Reply
Hi Bob!
– AMD CPUs are great; so an X570 MB is great.
– Use Ubuntu 20.04 + Anaconda + PyTorch. If you want to do deep learning
that is the way to go. You will have the least issues overall if you use that.
– fast.ai is by far the best course for deep learning for software engineers
– just google around for pytorch samples for the models that you learn about
in the fast.ai classes
Good luck!
Reply
Hi Tim:
Glad I found your site. I truly appreciate your help in advancing my
knowledge of AI & DL. Really appreciate your help.
Cheers…Bob
Reply
haykelvin says
2020-09-21 at 02:28
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 31/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim, thank you so much for this awesome article. It is very informative and
interesting to see those number(both theoretical and from actual testing) in the
reasoning. I am new to the field and start playing with pytorch recently, sure this will
help me and lots of others to make wise choice when selecting hardware in future
ML builds.
Have read through the thread and saw your concern on AMD gpu software
compatibility issue. I saw a few good deals of VEGA FE in my local second-hand
market, the 16gb of ram looks sweet on paper, do you think those can give me
some good bang for the buck if I don’t mind to experiment with them a bit? I also
see cost efficient upgradability if I want to get more of those in the future second-
hand market. Or would you recommend just stick with CUDA at all?
Reply
Our community could definitely need more AMD enthusiasts. Currently, AMD
GPUs work for deep learning, but their performance is not as good and there
might be some hidden issues here and there. So if you want to just get things
running I recommend NVIDIA + CUDA. If you want to contribute actively to the
community AMD and ROCm is great — this option helps a lot of diffuse the
NVIDIA monopoly over time but you can expect a more frustrating experience.
Reply
joy says
2020-09-20 at 07:10
hi, Need one recommendation, i got 8 GPU to build a Deep learning machine ?
which motherboard (which supports AMD 7000 series cpu) you recommend to
support 8 times PCIe*16 slots … and have multiple M.2 SSD slots too…
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 32/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
There is no regular server motherboard from desktop vendors that does that I
think. I think you need to go with specialized motherboards like those from
Supermicro. I have too little experience with servers to recommend a particular
motherboard. Usually, you just go with that you need and what is cheap and
support and warranty covers all issues. So you will get it working and keep it
working without any problem.
Reply
Hi Tim,
I wanted to start by saying that I loved reading your GPU and Deep learning
hardware guide, I learned alot!
It still left me with a couple of questions (I’m pretty new when it comes to computer
building and spec in general). I’m mainly interested in Deep Reinforcement
Learning and, I read that for DRL, CPU is much more important then it is in other
fields of Deep Learning because of the need to handle the simulations. So i’m
wondering if going with a Ryzen 5 2600 is enough or I should go with something
which has more core, higher clock and/or supported memory. Also, with DRL, can I
get away with a cheaper GPU like the RTX 2060 or the GTX 1070. I’m not really on a
tight budget but i’m looking to make it the most cost-effective possible while not
being restrained too much by my PC.
I don’t know if it matters but i’m mostly trying to do Reinforcement Learning for
financial markets trading.
Thank you!
Reply
Hi Christophe,
I think for deep reinforcement learning you want a CPU with lots of cores. The
Ryzen 5 2600 is a pretty solid counterpart for an RTX 2060. GTX 1070 could
also work, but I would prefer an RTX 2060 for DRL. You could also wait a bit for
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 33/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
the RTX 3060 and get a cheap threadripper to improve performance further.
However, that setup might be a bit more expensive and you have to wait
longer to get the parts.
Reply
Chris-Sij says
2020-09-15 at 14:35
I’m looking to build a home-based machine learning setup that will utilize transfer
learning and classification and apply findings comparatively to CT’s. I’ll have a
plethora of data but my actual data input size is incredibly small in single instances.
I’m looking to build a system that provides the most bang for my buck and have a
desire to build a machine around a Titan XP, if possible (or advised). There’s a
potential for getting a second Titan for future work if the single one is not enough
or up to the task later on. However, I’m unfamiliar with Nvidia based setups when it
comes to personal building so I’d love some advice on what kind of other parts I
should be looking to pick-up. I’m most likely going to be pairing this single Titan
with 32GB of RAM (2-16GB sticks), but am pretty much stuck after that point. I’d
appreciate any direction you could provide as this is all new territory to me and am
trying to avoid cloud-computing services like AWS for the time being.
Reply
Have a look at my other blog post about GPUs. There I have some “barebone”
setups for 2 GPUs which you can use a guide for your build.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 34/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
You really are something else! You have provided some exemplary resources for
me. Seeing the number of responses to comments you have is incredible. Thank
you very much for what you do!
Reply
Thank you
Reply
darklinux says
2020-09-09 at 00:15
Reply
I would definitely go with two RTX 2070 Super. The memory on GTX 1660 is just
a bit small.
Reply
darklinux says
2020-09-09 at 18:41
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 35/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
If it would be two RTX 3070s for the price of 4x GTX 1660 then
definitely go with the 2x RTX 3070!
Reply
darklinux says
2020-09-10 at 16:32
Abhishek says
2020-08-24 at 22:54
Hi Tim
Your article is nice and informative. You have got really great experience with server
configurations. I had small doubt!
What kind of server configuration would be required to do video analytics on 30-40
4MP CCTV cameras simultaneously? Its a basically Boundary surveillance project. In
Video analytics, taslk would be to identify human, animal or bird. I am inclined to
use Intel processors in general.
What if no, of cameras are 12 ? What configuration would you suggest?
Thank you
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 36/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
At 4MP and a framerate of 30 fps you have about 36 MB per second, taken
times 40 cameras that means about 1.4GB/s. The main problem here is to store
that data and pass it quickly to GPUs. An NVMe SSD raid would be very helpful
here. Otherwise, it depends on the network and the resolution that you want to
process. 4MP is pretty large and you definitely need to downsize images.
Downsizes images with YOLO can be processes at about 200 FPS which means
you need about 6 GPUs to process the data efficiently. These figures are for
RTX 20 GPUs, so I imagine 4x RTX 30 GPUs could work. If you reduce the frame
rate by 1/4 to 8 fps per CCTV you could process everything on a single GPU.
Reply
Xuan says
2020-08-12 at 23:00
Hi Tim,
Thank you for detailed description on building a Deep Learning machine. I would
request your suggestion if the config i am building will work out well or not.
I was wondering if I can add one more graphic card (2080 ti) to the above config.
Does the above motherboard support 2 graphic cards 2080ti?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 37/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
The build looks good! The motherboard supports multiple GPUs, so that would
be an option. If you only get a single GPU, you do not need a power supply of
1300W probably 700-800W would be sufficient if you go for a single RTX 30
GPU. With 2 GPUs, it makes sense to go for 1300W just to have a bit extra
room (also for a third GPU).
Reply
Alex says
2020-08-12 at 05:16
Hello Tim.
I’m looking now for good laptop to start with deep learning.
Can you advise me please, if HP Omen 15″ model with 7i Intel processor, 16 GB
RAM and
GPU Nvidia RTX2070 (8 GB) is good choice?
If it is not, what is the good laptop for your opinion?
Reply
I do not know much about laptops. There are many other things to consider
because laptops can be quite personal (battery life, weight etc.). In terms of
deep learning performance, i7, 16 GB RAM and RTX 2070 sounds very good for
a laptop. With that, you would definitely be able to do some pretty good deep
learning.
Reply
Keshav says
2020-08-10 at 04:40
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 38/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi, I was planning to build a PC with the RTX 2060 Super. Should I wait for the 30xx
series in terms of price and performance or shall I go ahead and order it. Since RTX
20xx series is getting discontinued need to make a decision soon.
Reply
Hi Tim,
“Be careful about the memory requirements when you pick your GPU. RTX cards,
which can run in 16-bits, can train models which are twice as big with the same
memory compared to GTX cards. As such RTX cards have a memory advantage
and picking RTX cards and learn how to use 16-bit models effectively will carry you
a long way.”
Does this mean that because of the lack of precision the memory requirement is
halved, hence you can have a model which is twice as big for a cards with the same
RAM.
I’ve also read in other places about “models not fitting into memory”, what does
this actually mean?
What are we “Fitting into RAM”? Is it a combination of the model itself and the
data? Or just the model? Or just the Data? I thought using things like TF we load
things in batches anyway. So why does this matter?
Could you clear this confusion up for me?
Thanks
Reply
The data usually takes up almost no memory since we, as you rightly pointed
out, only load one batch into GPU memory. Otherwise, it depends on the
model that you are working with. Convolutional networks are very small models
with very large activations while transformers are somewhere in-between (both
weights, gradients, and activations are large). Activations here refers to the data
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 39/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Mira says
2020-07-01 at 07:28
Hello, I would like to build a computer with AMD 3rd gen cpu.
My demand is focused on pci-e lanes. It must to have a GPU x16 together with 2
LAN cards x4 + x1 and two m.2 SSDs (at least one at full speed x4).
My question is, would this works on mainstream AM4 MB with pci-e lanes working
at x16/x4/x1 (+ x4 for SSD)?
I don´t want to devaluate it to x8/x8/x1.
Reply
I am not quite sure if that works. As I understand it, there are a lot of different
varieties of combinations of Ryzen 3rd gen CPUs together with PCIe 4.0
motherboard, but I think in any case one is just able to use 16x lanes for the
GPU if you just use the m.2 SSDs. If you add 2 LAN cards I think this will
downgrade the GPU to 8x lanes. I could be wrong about this, but as I
understand it, in many cases you still have “extra lanes” but these are no
distributed equally across all slots and using some components will draw away
lanes from the GPU.
Reply
Mira says
2020-07-07 at 23:38
And there are other and similar, you can see, that for DUAL GPU there is
x16 + x4 lanes.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 40/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Odyssee says
2020-06-09 at 07:33
Thanks,
Reply
Sorry I no longer have the code I think. I think I never uploaded it to github. I
used a Linux tool that downclocks the core clock rate and benchmarks
performance in taht way. For PCIe lanes you can just use NVIDIA’s CUDA
sample library for benchmarking.
Reply
TK says
2020-05-24 at 20:51
What about a laptop that is equipped with rtx 2070 super max p? Would it be
sufficient for deep learning? Understand that the mobile GPU is less efficient than a
desktop gpu but I’m working from two sites so having a laptop is definitely a win
for me.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 41/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
You can do many things with that GPU but not all models will fit into 8 GB of
memory and it will be about half as slow as a desktop GPU. If that is okay for
you it might be a good option.
Reply
Tim,
Reply
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 42/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
lazy_propogator says
2020-05-24 at 03:44
Hello Tim, hope you are doing well. Can you help me choose between the AMD
Ryzen 7-3750H Processor vs an Intel i5 processor for the CPU? They are being
coupled with the RTX 2060 and the GTX 1660 Ti respectively. Would the AMD CPU
be a bottleneck for the RTX? Are there any potential problems which can arise in
the use of AMD CPU processors in deep learning?
Reply
Mayur says
2020-05-21 at 09:14
Hi Tim,
Thank you for detailed description on all the essentials in setting up Deep Learning
machine. I am currently building my DL machine and would request your
suggestion if the config i am building will work out well or not.
Reply
The CPU is a bit overkill if you want to just do deep learning. If you want to also
do other things with the computer it looks pretty good. This would be a well-
balanced build for Kaggle competitions for example.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 43/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Russell says
2020-05-19 at 20:33
Hi Tim,
https://au.pcpartpicker.com/list/JFhR8M
As the mobo has 1x x16 slot and 1x x8 slot and hence the 2nd GPU is only getting
PCIe 3.0 x8, should I get a different mobo that supports both x16?
Cheers,
Russ
Reply
Reply
Greg says
2020-05-02 at 12:13
Hi Tim,
thank you for the all information that you put in here.
however I have problem with choosing GPU for my motherboard which is ASRock
z270 Pro4. I am trying to upgrade my PC for software like Zbrush, Maya, Substance
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 44/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Painter etc.
the problem that I have is that on one website that I have looked at, these GPU
perform average with my motherboard.
I don’t want to waste my money so I would like to ask you if you know any website
where I could compare performance of GPU with a motherboard or if you have any
suggestions what GPUs would be the best for my motherboard.
I think they should perform equally well on the motherboard. I am not sure why
it would be otherwise.
Reply
Marc says
2020-04-30 at 13:28
32GB vs 64GB of RAM. Given current RAM prices, is it worth just going for 64GB.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 45/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Yes, I agree. RAM prices are fluctuating but right now RAM is pretty affordable!
Reply
Wen says
2020-04-20 at 21:49
Hi Tim,
Thanks for your post. I have a i7-3770 4-core desktop with16GB RAM. I was happy
to see Q/A above that it’s ok to buy a RTX 2080 ti. But then I read on another
website saying that power supply won’t be enough for GTX card above 1030 for a
motherboard of Optiplex 7010 which is what I have. I haven’t confirmed that PSU is
250W (for which I suppose I need to open the cover and check physically). Do you
know if that is true?
Thank you.
Reply
That is true, if the PSU is only 250 watts you will not be able to run a RTX 2080
Ti on that. If you start upgrading the PSU though it might be worth thinking if it
is worth it or to build a new desktop entirely. Both options can make sense
depending on budget and other constraints.
Reply
Dimiter says
2020-04-11 at 04:47
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 46/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim,
I have an old X99 MoBo and Intel CPU (Intel I7-5930K) from 2015. One (or possibly)
both of them died, I cannot really troubleshoot without replacing them. I am
thinking of buying both, but the question is move to ADM vs stay with Intel: AMD’s
1920X or 2920x with X399 MoBo vs Intel CPU (not sure which one, comparable Intel
7900X is crazy expensive) and X299 MoBo. It seems much more economical to
move to AMD, even if it would complicate processor water cooling etc. What fo you
think?
Thank you.
Reply
sourav says
2020-03-19 at 09:58
Hi Tim,
I currently have a gaming pc with 1050ti, 8GB DDR3 RAM, and AMD FX6300
processor. I would like to upgrade to RTX 2060 SUPER/ RTX 2060, 16GB DDR4 RAM,
and an Intel CPU. I want to save money for the GPU, thus, I decided to use a
budget CPU. I am thinking about Intel Core i3-9100F (4 cores, 3.6 GHz, 65 W,
locked, 80$). This does not come with an integrated graphic (I will add the GPU). Is
that a good CPU for a single GPU build? Or, should I look for an old CPU &
Motherboard Combo under $120 on eBay? If yes, which CPU and Motherboard
would be a good fit for my budget?
Thank You
Reply
I think the CPU should be more than fine for a single GPU. You should worry
more about other applications (CPU-based ML for Kaggle competitions, for
example) that might be bottlenecked by the CPU. You can also roll with a AMD
CPU which are now pretty cost-efficient and powerful but it would only make a
small difference.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 47/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hello Tim,
thank you for insightfull article. Maybe you can also give me some guidance on the
choice in GPU . I work in a hospital and want to start with deep learning projects on
high resolution image data-sets from MRI and CT.
Choice for GPU I now thinking about is: RTX-Titan RTX versus 1 (or 2) RTX 2080 Ti?
(in combination with a AMD Threadripper X399-a CPU. )
I think we will not run multiple projects at once, but I want to be, GPU-memory-
wise, on the save side GPU-memory-wise, and the Titan has 24GB.
Kind regards,
Jochen van Osch
Reply
I would would definitely go with the RTX Titan! The memory will be a life-safer
if you work with medical images! Also make sure to invest in NVMe SSDs as
loading large unprocessed images can be a large bottleneck. I recommend
getting a motherboard that supports 3x NVMe SSDs and then get 3x of them
and setup a virtual SSD Raid 0.
Reply
Adam TS says
2020-03-03 at 13:15
I was wondering what the lower limit on RAM speeds is? I am looking at
repurposing old server hardware and have 64gb of 1333mhz DDR3 memory and
was wondering if this would be a bottleneck? Also I have committed to offsetting
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 48/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
my carbon footprint, and wanted to thank you for encouraging others to do the
same!!
Reply
It can always be a bit tricky to re-purpose old hardware but if the computer
boots with the RAM stick then it should not be the biggest bottleneck. Since
you rarely use the RAM in deep learning training and since the RAM is usually
of similar speed to the PCIe bus it should not be a bit bottleneck. If you run
DDR3 memory with 4 GPUs the PCIe bus and the RAM should be of about
equal speed and you should only loose about 5-10% performance.
Reply
Hi Tim, I am stuck with a Nvidia GeForce GTX 1660. Do I stand a chance with this
model of GPU or do I need to buy something else? The problem that the ram is
only 6GB but I cannot afford anything more.
Reply
It will be difficult but you can look up techniques to conserve memory. You will
probably also need to accept running smaller datasets and models.
Reply
Aaric says
2020-01-24 at 05:44
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 49/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Yes, usually even if you just want to get started a GPU is required. A CPU can
be quite slow even for small problems.
Reply
Satchel says
2020-01-18 at 19:06
The CPU fits your requirements (8 threads, 1 GPU) but is more than 10 years old and
only supports PCIe 2.0×16. How significant would this bottleneck be in your
opinion, and does it warrant an upgrade to a modern CPU?
Finally, I’m curious about your opinion on AMD Cards / ROCm stack, as I have
access to a R9 290 and Vega 54.
Reply
I believe PCIe 2.0 would be sufficient in your case but there is not enough data
to say that definitely. I would give it a try and upgrade your computer if it does
not work out. Theoretically it should be fine.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 50/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I would avoid the AMD cards still due to compatibility with software.
Reply
Satchel says
2020-11-22 at 19:54
Thanks for your advice! I ended up getting a 1070 and it worked pretty
decently. (A little faster than colab). Since then I actually entered into a
masters degree, and I’m using it quite a bit more often.
I’ve noticed when training that both my GPU and CPU usage are around
98%-100% (occasionally GPU at 94%) sometimes either one may be slightly
higher than the other but it tends to be more CPU bound (98% CPU, 94%
GPU, sometimes 100% on GPU)
Reply
The high CPU percentages do not necessarily mean that your CPU is
utilized. Some libraries use active waiting, which will keep the CPU busy
with “empty” calculations. The GPU utilization is also not the true
utilization; it just means that all cores on the GPU are used (but not by
how much).
One test that you can decide on a CPU upgrade is to limit the
frequency on the CPU manually. This can be done with some CPUs on
Linux (for Intel, it is easy, for AMD, I am not sure, but it should be
possible). Then you can compare the performance with an
underclocked CPU. If it is much lower, then the CPU is a bottleneck. If
the performance is similar, the CPU is not a bottleneck.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 51/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim,
I have doubts in order to choose a CPU. I know Ryzen 7 3700x or something like
that seems pretty good but I’m worried about the Intel MKL library’s issues.
I mostly do deep learnign stuff but I also want to use my pc to some kaggle
competitions (mostly tree-based models that runs on cpu in sklearn)
Regards,
Michel
Reply
MKL library issues are only for things like solvers, Fourier transform,
eigendecomposition. I am not sure if that is really that common for Kaggle
competitions and you would only be hit by a small penalty. I think Ryzen
processors are fine even in your case.
Reply
Hey Tim,
Could you please tell me which processor you think will fit better this pc i’m
planning to build:
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 52/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Also, is that good enough for a deep learning student? The lab i’m working on uses
computer vision to recognize LIBRAS, which is a Brazilian Sign Language…
Both CPUs are more than fine for one GPU. I might just go with the cheaper
one.
Reply
For real world applications in a 2 GPU system running RTX 2080 Ti’s at most, is
there much difference between x8/x8 and x16/x4? Does it effectively make both
perform as a x4 to keep things in sync when running model/data parallelism?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 53/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Steven says
2019-12-15 at 18:08
Hi Tim,
I was wondering if you had any opinion on cube computer cases. I’m thinking of
exchanging my full tower case for a cube case to save space, but can’t find
anything that let’s me know what (if any) the cost for cooling my GPU (possibly
expanding to 2 GPUs) is. Do you have any knowledge or opinions?…
Thanks
Reply
Are 2x GPU for machine learning worth it? Should I buy a board now that allows for
2x x8/x8, or upgrade to Threadripper for multi-GPU later on?
Reply
More GPUs are always better :). If you plan to go for 4 GPUs in the future, it
makes sense to get the Threadripper and the right motherboard right away. But
then you should ask yourself, do you really need 4 GPUs / is spending that
money justified?
Reply
Reply
Atharva says
2019-12-13 at 01:24
Hey Tim,
Please tell me what do you think of this pc:
i7-9700K processor (3.6 GHz, 12 MB)
Hard Drive 2TB 7200+1TB SSD M.2
NVIDIA GEFORCE RTX 2070 8GB
will this be good for deep learning?
BTW loved your article!
Regards,
Reply
I think that is appropriate. I also like Ryzen CPUs if you want to save a bit of
money. They would definitely also be more than enough for an RTX 2070 GPU.
Reply
Xiaopeng Fu says
2019-12-11 at 22:11
Hi Tim,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 55/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
A lot thanks for the wonderful article and all the replies to our queries. Being new
to this field and preparing to build my own system, I’m wondering if it’s worth
waiting for Intel’s 10th gen desktop CPUs. Is it good idea, for example, to buy a
9900k + z390 board at this moment, knowing the board will not be compatible for
future CPU upgrades. Or maybe the 10th gen improvement will not make much
difference for DL…
Thanks!
Xiaopeng
Reply
The CPU does not matter that much for deep learning. If you have some
workloads which require a better CPU (factorization, sklearn models, some big
data stuff ) then it might well worth it to wait. However, if you just want to get
started and do deep learning it might be better to just go ahead now — you
will lose almost no deep learning performance if you use a 9900k CPU.
Reply
Hey Tim, If you have a moment I’d be curious to know what you think about my
build and my reasoning behind it: https://pcpartpicker.com/b/3jw6Mp
Much appreciated.
Eric
Reply
I think it looks good. two things though: The PSU with that high of wattage is
only needed if you want to expand to two GPUs in the future — think again if
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 56/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
that is really what you want. Otherwise, I would use the same NVMe SSD
instead of a small and a larger one. I guess you want to store the OS on the
smaller one and have the rest for data? The better thing is to use a small
partition for the OS and then use a virtual RAID 0 to create a single high-speed
device — this can make a huge difference if you work with very large datasets!
Otherwise, quite some few spinning disks, but if you need the space, then you
need the space.
Reply
Thanks Tim.
Yes, the original thought behind the 850W PSU was for two GPUs. That’s
also why I have that motherboard – for the x8/x8 configuration. Do you
think it’s reasonable to run two GPUs in this setup, or should I plan on
moving to a threadripper system when I want to go for the second GPU?
My intent for this machine is personal projects and a Masters program, but
I also want to be open to scaling for a small startup and/or consulting.
I’m not following on the virtual RAID 0 configuration. Do you mean run
Windows and Linux on the same drive on two partitions, and RAID 0 with
the second physical drive? Do you have a link?
Is there any reason to not go with a single 1 TB drive for OS and datasets vs
a drive for OS and another drive for datasets?
Reply
I think startup stuff and consulting is fair with 2 GPUs. You want to get
GPUs with big memory though if you want to do startup stuff,
preferably a Titan RTX. On Linux a virtual RAID 0 is easy to
setup: https://www.digitalocean.com/community/tutorials/how-to-
create-raid-arrays-with-mdadm-on-ubuntu-16-04. Not sure about
Windows though.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 57/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Juliana says
2019-12-02 at 08:13
Hi Tim, Thank you so much for you work and for your helpful guides.
I was wondering if you would mind looking at my build project and helping me with
a doubt I have (regarding the CPU/motherboard combination).
There a 110 euros gap in Spain between the 6 cores Ryzen 5 3600 and the 8 cores
Ryzen 7 3700X.
Isn’t this a bit wasteful for a Ryzen 5 3600? Should I go for a Ryzen 7 instead?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 58/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Michael says
2019-11-24 at 15:45
Hi Tim, which GPUs would get if you had $10k, and wanted to use them to train
large transformer-based models, at home. Note that at home you would have to
pay for electricity yourself.
I’m trying to decide between 8x2080Ti vs 4xRTX Titan X vs 2xQuadro 8000. Also
note that four RTX Titan X cards in the same chassis will overheat due to their fan
type, and I’m not very comfortable to water cool them.
Reply
Michael says
2019-12-06 at 10:16
Reply
Hugo says
2019-11-22 at 10:54
Hi Tim,
Congratulations for your great work
What setup would you recommend for GPT-2 ( pre-trained language model ) latest
release (1.5b parameters) ?
I am intending to train this AI for my researches, but i am very unaware about the
hardware needed. I have read that numerous users have issues with still powerfull
setup.
Any idea ?
Sorry for my english, not my mothertongue
Best Regards.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 59/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
A minimum would be 4x RTX 2080 Ti. You might use very small batch sizes
though which is computationally inefficient, thus I would not recommend RTX
2080 Tis. I would recommend instead 4x Titan RTX which should have enough
memory so you can run GPT-2 and other transformers with a large enough
batch size.
Reply
Hugo says
2019-12-28 at 02:09
Reply
Filip says
2019-11-13 at 15:19
Tim, your hardware guide was really useful in identifying a deep learning machine
for me about 9 months ago. At that time the RTX2070s had started appearing in
gaming machines. Based on your info about the great value of the RTX2070s and
FP16 capability I saw that a gaming machine was a realistic cost-effective choice for
a small deep learning machine (1 gpu).
I ended up buying a Windows gaming machine with an RTX2070 for just a bit over
$1000. I ended up modifying the cooling to get positive case pressure (took off the
front bezel blocking the airflow) and making it a dual boot Windows10/Unbuntu18.
As a Linux newbie one gotcha I found out was using a Windows file system results
in a performance bottleneck in Linux. So I added an SSD with Ext4 for data
preprocessing and that made a big difference.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 60/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
It has been working great for learning deep learning (with pytorch) and Kaggle
competitions. I have found this local setup to be faster than Google Colab, Kaggle
kernels, and Azure notebooks and long runs are more reliable. The colorful case
lights are an added bonus!
Reply
Thanks for your feedback! I think cases like this are pretty common. Some
setups will fall a bit short here and there but with a bit adjustments you can
quickly get a great system that fulfills most of your needs.
Reply
Hi Tim,
Thanks for the excellent material. I’ve been working with a 4x2080Ti workstation.
Some of the new GAN training work really requires 8x2080Ti. I’ve been looking at
server based reference designs – deeplearning11 and deeplearning12 from
servethehome.com. I don’t know a lot about servers but it seems (from youtube
videos) that they generate horrible fan noise when all GPUs are used. Have you
given any thought to a 8xGPU machine that can live comfortably in a home
environment? Any thoughts appreciated. Anu
Reply
I do not think you will find a 8 GPU machine which you can comfortable house
in a home. If you have a small room far from other rooms (bedroom/living
room) you might be able to do it if you put some noise insulation into the
room and put the server there. It might just be a better idea to rent some
GPUs/TPUs in the cloud for whenever you need to run 8 GPU jobs. You can get
a 4 GPU for most other things and only use the 8 GPUs if you need them. Or
do batch aggregation to simulate 8 GPU training. Batch aggregation will just
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 61/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
double the training time for you, which should be alright and is doable in a
home environment.
Reply
Mike says
2019-11-05 at 19:43
Dear Tim,
Reply
I would not recommend laptops for medical imaging deep learning projects.
Usually, in medical imaging you will have images with very high resolution and
you will need a GPU which has the most memory that you can afford (Titan
RTX 24GB). You can buy a desktop and a small laptop with which you login to
the desktop when you are on the go. This would be the best solution.
Reply
vithin says
2019-10-17 at 07:50
Can I use multiple different GPU cards on a single CPU for deep learning?
Reply
Yes, but you will not be able to parallelize across those GPUs.
Reply
Alex says
2019-10-07 at 12:10
Hi Tim,
I am going to use it for training CNNs(Kaggle, not large projects). I’m also planning
to add the second GPU after some time. Is this build sufficient for my purposes?
Would you recommend any different processor? I’m a bit worried about the cooling
– is it enough for the current build and is something needed to be changed for 2
GPUs?
Reply
sakshi says
2019-10-01 at 00:08
I love your blog. This all provided knowledge are unique than other Deep learning
blog.
Good explain, keep updating
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 63/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim,
I’m looking to build a machine with one 2080 TI, with the ability to expand it to a
second card. The difficulty I’m facing is that I want my machine to be quiet, but
watercooling is quite complex, expensive, and seems like high maintenance. Blower
fans aren’t particularly quiet.
Do you know if (on an x470 mobo) there are ways other than watercooling and two
blower fans that would keep the case cool enough? For example, how much would
hybrid cooling AIOs help? Say two of
these: https://www.evga.com/products/product.aspx?pn=11G-P4-2384-KR (bonus
points if you know whether any motherboard has headers for two aio pumps)? Or,
if I get an air cooled GPU now, would adding a blower fan to that allow for
sufficient cooling?
Hoping you can offer any advice on this issue. I’ve searched the internet but I think
my demands may be a bit too high / specific.
Reply
If you just want to run two cards you can get a motherboard with at least 3
PCIe slots and use non-blower fans. Because you have a single PCIe slot which
is empty between cards cooling is usually sufficient and you can run a bit more
silent non-blower fans. Otherwise, AIOs can help. People have mixed reviews
about them, some reporting very low temperatures, others report similar
temperatures to regular fans. I think you cannot do much wrong with AIO GPUs
if you want silent performance, the 3x+ PCIe slot + non-blower fan option is
cheaper though.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 64/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
What’s not quite clear to me, though: AM4 compatible motherboards / ATX
towers don’t seem like they would support this in terms of physical
dimensions. E.g. on the Prime Asus x470 Pro, there are 2 GPU slots with 3-
slot-width space, and the third slot only has a 1-slot-width space. I’m not
sure how I can manage to put a graphics card in the bottom slot. In many
cases the PSU or bottom of the case would be in the way, and most don’t
have the right number of expansion slots on the back. Am I overlooking
something?
Do you think a hybrid cooled 2-width GPU on the first PCIEx16 and an air
cooled 3-width GPU on the second PCIEx16 could work? That would mean
there’s 1 slot of space in between the two, with part of the heat from the
top one going to the radiator.
Thanks again!
Reply
If you have the right case you can install a GPU on the bottom slot. It
only has a 1-slot-width, but in some computer cases, the GPU just
extends beyond the motherboard. If you look for cases that optimized
for GPU airflow you can probably find a usable case.
Reply
Ahmad says
2019-09-30 at 02:35
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 65/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
My question may look a little broad sorry for that. If you need any other
information please let me know.
If you require a large amount of memory (to hold different kind of models?)
and only want to do inference then working a CPU might actually be an
excellent option. For inference, in general, the software will be far more
important than the hardware. I think in terms of memory/dollar one of the best
options will be the RTX 2080 Ti or the RTX 2060 — but I am not sure if memory
is really your problem.
Reply
I bought a rtx 2060 super. And I have a system with i5 3470. I added ram so it’s 16
gbs, It has an ssd and a hdd. with the 3470, only two cores are at a 100%. I can
upgrade to a 8500, but would it make a lot of difference?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 66/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Carl says
2019-09-15 at 13:33
Hi, I am considering buying a GPU for deep learning. If I understand this article
right there are different models of RTX2070 card. I am looking for 16-bit FP, but I
don’t see any information about FP. Could you tell me which parameter in the
specification should I pay attention to?
Reply
Sorry for the confusion, but all RTX 2070 have 16-bit capability. You can pick
any card. If you have the money though, I would recommend picking the RTX
2070 Super over the regular one.
Reply
Hi Tim,
I’m trying to set up following PC for your guide. (For especially Kaggle -beginner-
competitions) Does it look good? Thank you.
*Asus Turbo GeForce RTX 2070 8GB 256Bit GDDR6 (DX12) PCI-E 3.0 GPU (TURBO-
RTX2070-8G)
*AMD RYZEN 5 2600X 6-Core 3.6 GHz (4.2 GHz Max Boost) Socket AM4 95W CPU
*ASUS TUF X470-PLUS GAMING AMD X470 AM4 Ryzen DDR4 3200MHz(OC) M.2
USB3.1 mOTHERBOARD
*Crucial 32GB (2x16GB) Ballistix Sport LT Gray DDR4 3000MHz CL15 1.35V PC Ram
*Corsair TX-M Series TX850M 80+ Gold PSU CP-9020130-EU (850W)
*Intel 660P 1TB 1800MB-1800MB/s NVMe M.2 QLC SSD
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 67/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
BrN says
2019-09-04 at 15:59
Hey Tim, really appreciate your post here. Has been a huge help. I’m currently
doing some deep learning application on MRI images using mostly
Tensorflow/Keras. I’d like to build a workstation with a 1 GPU set up for now with the
plan to up it to 2 GPUs in the future. I don’t think I’ll be going to the 4 GPU set up.
Thanks!
Reply
You might want to have a slightly bigger PSU if you want to run two GPUs. The
extra PCIe lanes are not worth it.
Reply
Thank you for this great article~! It helps alot! I also found this
article(https://medium.com/the-mission/how-to-build-the-perfect-deep-learning-
computer-and-save-thousands-of-dollars-9ec3b2eb4ce2) which recommends to
use AMD ThreadRipper 2920X and here is the
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 68/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
https://cpu.userbenchmark.com/Compare/AMD-Ryzen-TR-2920X-vs-AMD-Ryzen-
7-3700X/m625966vs4043
My question is, should I keep everything same but just replace Threadripper 2920X
with Ryzen 7 3700X? or should I stick to TR 2920X? and Why?
Thank you!
Reply
Make sure the Ryzen CPU supports the number of GPUs that you want to have.
If it does it is a great and cheap option!
Reply
Winston says
2019-09-11 at 19:15
Reply
Hey Tim,
First of all thank you very much for your wonderful article with a great insight about
Deep Learning. It helped me to certain extent to understand the hardware
requirements needed for any DL machine. But I have an existing system with the
following Config:
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 69/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
By the Way, Can this CPU perform good for ML or we need to rebuild the PC?
Reply
The CPU will be fine if you use 1-2 GPUs. If you have more you need something
better.
Reply
Reply
Claus says
2019-08-28 at 09:45
Tahnk you! This is extremely helpful. I need to buy a multi GPU setup suited for
deep learning analysis of 3D radiological data – eventually several TB. What kind of
a setup would be recommended if I have about 30k€ available? Several of your
comments relate to smaller systems, so what are the key caveats for larger systems
like the one I need to buy?
A more specific question relates to GEForce 2080 ti versus Tesla VT100 (10x the
price!). Any killer argument for Tesla? More VRAM than 11GB needed in our case
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 70/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I would recommend a 8 GPU machine with 8x RTX Titan. Reach out to some
hardware vendors that offer these systems. It might be that for such a machine
the budged you need is slightly higher (32k euro). If this is a case a 4 GPU
machine with 4 RTX Titan is also great. RTX 2080 Ti has too small memory for
your application. The V100 is too pricey and not good!
Reply
Claus says
2019-09-12 at 07:42
Thank you very much indeed for your advice. In the meantime my
American colleague suggested to go for the V100, despite the price with
the following argument:
GE force cards are totally fine for 2D models in particularly if you want to
leverage imageNet transfer learning by cropping or resizing (first better) at
224×224 . However, NVidia advances tools as for example AMP automated
mixed precision might not be available on GE force and works just on V-
100. AMP allows you to train deeper models or larger training batch (faster
training) with limited memory footprint. If you are planning 3D data driven
models or multi-channel (informations from different sequences) I would
definitely chose V-100 32 Gb cards.
If we follow this advice we could only start with V100 GPUs and buy more
at a later point in time. Do you have a comment and could you elaborate
what you meant when stating V100 is …and not good. Thank you once
again!
Reply
Michael says
2019-09-13 at 17:27
Titan RTX is slightly slower than V100, and has 24GB of RAM (vs 32GB
in V100). It supports all the features of V100 (including AMP). Your
colleague does not know what he’s talking about. If you’re absolutely
sure that your model is not going to fit in 24GB or RAM even with
batch size of 1, then I recommend going with four Quadro RTX 8000
cards (48GB of RAM at $5,500).
Here’s a good system builder in US:
https://lambdalabs.com/products/blade
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 71/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Reply
Lina says
2020-05-19 at 08:32
Hi Tim,
Thanks for your great article! I have questions, please can you
answer them?!
What about using multi Titan rtx instead of multi quadro 5000?
Which one will be faster? Also, I found that Lambda uses quadro
and Tesla instead of titan rtx for DL server, what is the point? Is that
just for double precision?
Thanks!
Both are about the same. Lambda uses quadro because they make
more profit. It also might be that NVIDIA does not sell them RTX
cards anymore. There is a clause in the CUDA license that forbids
the use of RTX cards in data centers. So this could also be a
reason.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 72/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
RB says
2019-08-27 at 15:38
Hi,
I am looking to do some entry level DL stuff and then build my way upto kaggle. I
would appreciate any feedback on the following machine.
https://pcpartpicker.com/list/GxyQCb
Thank in advance!
Reply
Let’s say I decide to go with an Intel i9-9900KF, which has only 16 PCI lanes
available. In this scenario, I also used two GeForce RTX 2080 Ti GPUs. If I also use
an SSD, which requires 4 PCI lanes, would I still be able two for an 8x/8x setting with
the GPUs? Wouldn’t the system configuration be limited by the max number of
CPU PCI lanes, and so considering the SSD, the GPU would be forced to an 8x/4x
setting? In this scenario, would it be better to get just one GPU if a Intend to use
parallelism?
Reply
Yes this is problematic. You can use a SATA SSD to solve this or another CPU.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 73/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Doesn’t this depend? The chipset has PCIe lanes in addition to the CPU
right? Therefore, if the m.2 is on the chipset then it wouldn’t take away
from the x8/x8 used by the GPUs?
Reply
Yes, you are right. I got it wrong the first time around. Most often your
motherboard will provide the PCIe lanes for the PCIe storage and thus
it does not take away from the GPU PCIe lanes.
Reply
lazy_propogator says
2019-08-17 at 21:51
Hello Tim, this is a great article! Thanks for all the info. NVIDIA recently released the
Super versions of the RTX cards, can you shed some insight on that? They are
supposed to be more powerful than their processors, its said that the RTX 2060
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 74/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
super is almost as good as the RTX 2070. But on the other hand there are reports
that the RTX 2080 Super is only slightly better than the RTX 2080. Can you shed
some light on this?
Thanks
Reply
I have not analyzed the data of the GPUs yet. What you say seems accurate
from my first impression though. So RTX 2070 and 2060 Super are good. RTX
2080 Super not so much.
Reply
Steven says
2019-08-17 at 12:01
Hi Tim,
Thank you for sharing! I’m looking to build a desktop for prototyping. What are
your opinions on Intel i5 9600k vs i7 9700k? Or do you recommend something
else? Also can you recommend a good compatible motherboard? I will be using
one RTX 2070 for now but would like to be future proof for up to 4 one day.
Thanks!
Reply
If you want to have 4 GPUs consider a CPU with at least 32 lanes and about 6-
8 cores.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 75/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Will this build provide enough airflow for a 9900K + 2080 Ti? It’s all air cooled, 2080
Ti model has an open-air design with two fans (can buy 3 fans if needed).
CPU: Intel Core i9-9900K 3.6 GHz 8-Core Processor (Purchased For $485.00)
CPU Cooler: Noctua NH-D15 82.5 CFM CPU Cooler (Purchased For $89.95)
Motherboard: Gigabyte Z390 AORUS PRO ATX LGA1151 Motherboard (Purchased
For $144.99)
Memory: Corsair Vengeance LPX 16 GB (2 x 8 GB) DDR4-3200 Memory ($84.99 @
Amazon)
Storage: Samsung 970 Evo Plus 500 GB M.2-2280 NVME Solid State Drive
(Purchased For $109.99)
Storage: Seagate Barracuda Compute 2 TB 3.5″ 7200RPM Internal Hard Drive
($54.99 @ Amazon)
Video Card: Zotac GeForce RTX 2080 Ti 11 GB AMP MAXX Video Card ($1099.99 @
Amazon)
Case: Cooler Master MasterCase H500 ATX Mid Tower Case ($99.99 @ B&H)
Power Supply: Corsair RMx (2018) 850 W 80+ Gold Certified Fully Modular ATX
Power Supply (Purchased For $94.49)
Reply
One GPU builds usually have no cooling issue. Airflow is not that critical. It is
more about what kind of cooling system you have on the GPU.
Reply
Bill says
2019-07-31 at 11:21
Followup to my q, here is the config I’m looking at. Partspicker doesn’t seem fully
current, noting Asus Rampage IV but not V is shown.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 76/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Bill says
2019-07-31 at 10:28
Does the ‘WS’ do 3 * x16 GPU’s, or only one? It seems to have a max of 3 GPUs, vs.
4 for the ROG. Partspicker lists the ROG IV (no V) without sellers, and the price of
the IV is $700+ on Newegg.
ASUS WS C621E Sage EEB Server Motherboard Dual LGA 3647 Intel C621
3 x PCIe 3.0 x16 (x16 mode)
2 x PCIe 3.0 x16 (Single at x16, dual at x8/x8)
2 x PCIe 3.0 x16 (x8 mode)
ASUS ROG RAMPAGE V EDITION 10 LGA 2011-v3 Intel X99 SATA 6Gb/s USB 3.1
Extended ATX Motherboards
4 x PCIe 3.0/2.0 x16 (x16, x16/x16, x16/x8/x8, x16/x8/x8/x8 or x8/x8/x8/x8 mode with
40-LANE CPU; x16, x16/x8 or x8/x8/x8 mode with 28-LANE CPU) *
* The PCIEx8_4 slot shares bandwidth with M.2 and U.2.
Reply
Gowri says
2019-07-03 at 20:22
Hello Tim,
Thanks so much for the blog and replies to comments. I am sorry if this is a
reposting, but my comment seemed to have disappeared, so thought I would post
again… It would be so helpful to have your insights.
I am attempting to put together a desktop with what I have available online and
locally, that is both DL-now ready and future proof. These are the components with
some questions:
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 77/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
(Selected this inspired by TensorBook’s choice of processor for a laptop; not sure
this is the best for the configuration selected, please do let me know if there’s a
better option)
4. 1 TB SSD (Samsung 970 Evo?) NVMe
5. ASUS mother board (an appropriate one)
(Would the ROG-Strix-Gaming-Motherboard-802-11ac/dp/B07HCPLQ2H be good
for DL too?)
6. Power supply – Corsair smps cx750
(we have occasional power cuts, so thought this is a worthy investment)
7. Hard disk for data (Seagate 2TB Fire Cuda)
8. Cabinet – Corsair Crystal 570x RGB 3 RGB fans
(Not sure if Mid Tower is sufficient for the config selected – is there a better option?)
I think this looks reasonable. You could go with a cheaper AMD processor
(Ryzen) to save some money. 2x 16 GB are great. Looks good otherwise!
Reply
Dmitry says
2019-06-14 at 11:21
Hi Tim,
First of all thanks a lot for the post – it saved quite a bit of time for me.
I’ve got a bit oldish machine with i7-3770K (4 cores + hyperthreading) and 32Gb
DDR3 RAM which I’d like to start using for Deep Learning (for NLP tasks).
Looking at your other post I am thinking about getting one RTX 2080 Ti though not
sure if my CPU would become a bottleneck and I better go for cheaper GTX 1080 Ti
instead.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 78/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Unfortunately most of posts on internet on this are from gaming perspective and
do not look too relevant…
Many thanks,
Dmitry
Reply
You should be fine with an i7-3770K for most tasks. Some tasks that make
heavy use of background data loaders such as computer vision can take a hit in
performance, but it should be not too much, maybe 30-50%. If you compare
this to getting a full new system sticking with your i7-3770K looks like a quite
cost-efficient solution. I would give it a go!
Reply
Dmitry says
2019-06-15 at 13:21
Reply
Daniel says
2019-06-11 at 06:15
Hello,
my name is Daniel, I am a student and using for the first time the PyTorch library
with Cuda. I was trying to train a network and came across some problems, and
hope you could help me out.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 79/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I went through some PyTorch tutorials and had seemingly no problems with this
setup. Nevertheless, when I try to train a bigger network with a big image dataset,
the CPU runs constantly at 100% and the GPU only at 0-5%. I have been trying to
find out what the problem is. I checked several times that my code is actually using
Cuda, but the CPU is still running at 100% and making the training progress
extremely slow.
From what I have read, I suppose that it should be a CPU bottleneck problem, but
wanted to confirm. I also looked at the RAM usage and it seems to stay between
85-90% during the training. Maybe it has also something to do with the fact that I
am using an eGPU?
Thanks in advance!
Reply
That sounds like an eGPU issue where the bottleneck is to transfer the data
iteratively to the GPU. One solution might be, if your dataset is not too large, to
transfer the entire dataset to your GPU. This will take some time, but once this
transfer is complete the CPU should no longer be a bottleneck since almost no
operations are executed on the GPU. If this does not solve the problem
something else might be wrong. You can run PyTorch profilers to find out
where the bottleneck comes from exactly.
Reply
Hi a great blog tbh and really helpful in deciding the most of the system for DL but
still i need one advice in terms of GPU
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 80/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
that is option one with future possibility of upgrading it to a better cpu with ddr4
ram
or i can go for better cpu like 6700K 16 gb ddr4 and settle for 1060 6gb and in near
future add another gpu or upgrade this one to a 2070 maybe
but that upgrade might take more than a year
and this machine wont be only used for DL as i would game on it too.. so yeah
Reply
I would go with the RTX 2060. If you learn to use it well you should be able to
use most deep learning models.
Reply
Al says
2019-06-15 at 04:37
I just got a second-hand Pascal card myself after trying a Turing RTX for a
few months. More RAM for a lower price, a much better deal than a RTX
2060. I’m mostly stuck with 32-bit in most cases anyway. Maybe in a year
or two 16-bit will actually be usable!
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 81/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Al says
2019-06-29 at 08:50
From what I’ve seen in Keras some layers -which you’re prone to
end up needing to use like batch norm- don’t support 16-bit. Also,
the results are often out of whack, with big losses that don’t
improve and so on, even following the guidelines. When it works
it’s great but when it doesn’t you just can’t justify spending the
time to make it work.
Another annoying side effect of faster cards with the same amount
of RAM is that you may often find that you’re under-utilising the
compute capacity. So why use a “fast” RTX at 50-70% capacity
when you can use a cheaper GTX at 80-100% and get the same
results in the same time? This happened to me recently when
using cudnn LSTM and GRU Tensorflow layers from C++ for
inference but it can happen in many other cases.
I have not used Keras in years and I am not sure how to resolve
16-bit problems in Keras. In PyTorch, it is rather easy and works
well. I think this is primarily because NVIDIA is supporting 16-bit
compute with specialized libraries for PyTorch.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 82/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Michael says
2019-07-02 at 17:44
Reply
2019-06-29 at 08:22
They are about the same. Getting either one should be fine.
Al says
2019-06-29 at 08:54
You can find a second-hand 1070 for about 200 euros on eBay
now. I think it’s impossible to find an RTX 2060 for that price
anywhere. For the prices I’ve seen, the 1070 is a bit better value
even though it’s a bit slower. And it also has 8GB…
Hello Tim,
A friend of mine has bought the followings based on a build found on the internet:
CPU : Intel i7–8700K
RAM : 64GB : 16*4: G.Skill Ripjaws V DDR4 3200MHz
GPU : RTX 2080 Ti
Motherboared : MSI Z370 PC PRO
PSU : CoolerMaster Vanguard 1000W PSU
Cooler : ML240L Liquid Cooler with the Hyper 212 LED Turbo.
Storage : A 512GB 970 Pro Samsung M.2 SSD
The total cost was about 3500$
If I have only 1000 to 1500$ and if my aim is to have a decent build to go on kaggle
competitions (I am not looking to be at the top5 but let’s say around the top 100-
150), how can I change his build, do I keep the GPU ? the RAM ? etc
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 84/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
You do not necessarily need the RAM, but then you need to write careful code
that is memory efficient. This will already save you a lot. The PSU wattage is
suitable for 2-3 GPUs and can be reduced to 600 watts which will bring the
price down further. Otherwise, one can go for a cheap GPU. For one GPU a
cheap ryzen CPU with a cheap motherboard is more than enough. All of this is
for deep learning though, to run models on your CPU would be slow on this
setup, so you would need to make sure that you run boosting/tree models on
your GPU.
Reply
Ariel says
2019-05-30 at 08:17
Hi Tim, I’m reading your post as I’m about to build a deep learning machine.
I’m planning to get me a ASUS Turbo GeForce RTX 2080 8 GB Graphic Card GDDR6
with High-Performance Blower-Style Cooling for Small Chassis and SLI Setups
TURBO-RTX2080-8G ( https://tinyurl.com/y4y68dub ) and add : Patriot Viper 4
Series 16GB Kit (2 X 8GB) 3733 MHz (PC4 29800) DDR4 DRAM Kit (PV416G373C7K)
(https://tinyurl.com/y5k77pva ) and I’m waiting for the New AMD CPU that just
been announced, the Ryzen 9 3900X with 12 cores and 24 threads. would you
recommend?
thanks -Ariel
Reply
The AMD CPUs are quite good now but the 3900X has only 16 PCIe lanes so
only good for 2x GPU setups. If you only want two GPUs this is a great choice.
Otherwise, a Threadripper is a cost-effective option for 4 GPU setups.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 85/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
can you recommend any other AMD chipset that will go with this
confirugation?
thanks
Reply
Reply
Ariel R. says
2019-07-09 at 02:54
I found a motherboard with 3 slots for video cards as you can see
and will fit to the ne AMD CPU
Thanks
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 86/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Richard says
2019-05-23 at 07:36
Could you discuss the hardware implications of the type and application of deep
learning? Different hardware tradeoffs could be made for a box dedicated to
training an image classifier on a large dataset versus transfer learning with an
existing model and these hardware tradeoffs might be different if the application
was sentiment analysis or NLP. I suppose one way to determine those tradeoffs, as
alluded to in an earlier comment, would be to run the task in the cloud and get an
understanding of the bottlenecks and requirements that way before buying
dedicated hardware.
Reply
For image classifiers, it is useful to have a large SSD where you can put your full
dataset on (1TB+). Other than that, there are no tasks specific requirements.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 87/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hey,
Is RTX 2070 good enough to start with if I want to train architectures like YOLO
(object Detector) using Tensorflow.
Reply
Reply
Arthur says
2019-05-07 at 00:40
Hi Tim,
Since you have great experiences on building this kind of DL machine, I have a
question regarding how to optimize the Ethernet bandwidth on different HW
configurations and applications. For example, how many Gigabit or 10 Gigabit
Ethernet I need if I have 16 or 12 NVIDIA Tesla GPUs with 2 Intel Xeon Scalable
Processors for graphical analysis or gaming processing application? Thanks.
Reply
If you have 4 GPUs per node and you want to train traditional convolutional
networks with standard algorithms you should get at least 20-40 GBit/s
Infiniband, preferably 100 GBit/s or faster. 10GB/s and especially Ethernet will be
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 88/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
too slow for standard algorithms. With special algorithms, 10 GBit/s ethernet
can work but no open source project for these algorithms exists and
implementations on your own will take months. So it is better to invest in a
good networking solution and use standard libraries.
Reply
Hi. I’m a high school student graduating this year. I completed the deep learning
specialization on Coursera. Now that I’m confident enough to use pytorch for nlp
and RNNs on speech, I need a gpu. I can ask my parents to buy me a computer or
just use google colab. Would it be okay to just use colab even if I can afford a
computer?
Reply
Ade says
2019-04-24 at 11:27
Hello Tim
Please, I am a Ph.D. student and research area is deep learning . My potential build
is the following specification.
CPU: Intel® Xeon® Silver 4114 10-Core (2.2 GHz, 3.0GHz Turbo, 13.75M L3 Cache)
Motherboard: ASUS® WS C621E SAGE (DDR4 RDIMM, 6Gb/s, CrossFireX/SLI).
RAM: 64GB Kingston DDR4 2666MHz ECC Registered (2 x 32GB)
GPU: 11GB NVIDIA GEFORCE RTX 2080 Ti – HDMI, 3x DP GeForce – RTX VR Ready!
1st Storage: 6TB SEAGATE BARRACUDA PRO 3.5″, 7200 RPM 256MB CACHE
1st SSD Drive(OS installed) : 1TB SAMSUNG 970 EVO PLUS M.2, PCIe NVMe (up to
3500MB/R, 3300MB/W).
Please, is it enough for Image processing, training. I have 3 million images to train
(2 TB image dataset). Any suggestion on areas to improve on my build. The
motherboard supports 2 CPUs, up to 756GB RAM, as well.
Thanks
Ade
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 89/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
The SSD is really important here for image processing. Try to get multiple SSDs
and raid them in raid0 or buy at least one SSD which you solely use for your
dataset. Otherwise, it looks good.
Reply
Nick says
2019-03-29 at 04:11
I feel this guide is becoming obsolete because it ignores alternatives to GPUs, like
the Google TPU. There are conflicting claims but it does seem clear that chips which
were designed for accelerating deep neural networks are going to be better than
chips that were designed for accelerating graphics cards. There are at least half a
dozen companies with TPU-like products in the pipeline, it’s not just Google.
Reply
Please have a look at my updated GPU recommendation blog post which also
discusses TPUs.
Reply
Damian E says
2019-03-27 at 05:10
Hi,
I need a certain level of mobility so I want to go with a 17″ laptop with eGPU via
Thunderbolt 3.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 90/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I would like to know if it makes sense to purchase a laptop that already has an
integrated GPU (mobile rtx 2070-2080)? Can they work in pair? Or do I have to
switch between them and thus make the integrated one useless?
Also, Thunderbolt 3 caps and 40Gbs to PCIE and that is most likely the theoretical
maximum, not necessarily what you get. Does it make sense to go with the TRX
Titan? or and I burning money and should go with 2070?
Reply
Integrated GPUs are great but also expensive. If you find a cheap laptop with
integrated RTX 2070 I would go for that. If you want to have multiple GPUs
(internal + external) it gets complicated. I am not sure how this setup is
supported. I would look online for other people who tried. In general, a single
eGPU should also be great. It is also cheaper to upgrade the GPU without
upgrading the laptop!
Reply
silverstone says
2019-03-23 at 05:51
Hi Tim,
Thanks for the guide. What do you think about AMD vs. Intel CPU with NVIDIA
GPU? Are there any bottleneck for DL frameworks with AMD CPU?
Reply
It seems AMD CPUs are fine. I never had any problems with my AMD CPUs
both at home and in the office. One issue might be if you want to use your
CPU for some linear algebra (solvers and decomposition etc.), but other than
that AMD CPUs are great.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 91/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hamid says
2019-03-24 at 16:55
Reply
Reply
Thank you for pointing that out! That was caught quite quickly and the user is
now banned.
Reply
As reading the title I seriously didn’t think that the article going to be this much
deep on topic. Great start!!!
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 92/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I am tired of trying different coolants for my processor and heatsinks. Now I have
decided to use the thermal paste instead. Is that a good option?
Reply
Kartik says
2019-03-16 at 21:53
Hi Tim,
Thanks a lot for all your effort and also keeping this blog up to date.
I was thinking of taking an AMD threadripper 1900 which would be used for heavy
preprocessing and running xgboost or other libraries which run on cpu. Is it
overkill?
And ive been following Kaggle for a long time was doing other things and now i
am full on it. Ive seen people doing prototyping and training seperately.
Should i get an rtx 2080ti or two rtx2070 for training and prototyping ? Or maybe
make a cluster of gtx 1080ti ??
Reply
An RTX 2070 is great for prototyping. Since most of the time on Kaggle is spent
prototyping it is not so efficient to dedicated resources for training. I would say,
use your RTX 2070 also for training and if that is not sufficient (memory or
training time to high) use the cloud for access to fast GPUs. This will be cheaper
and more flexible.
Reply
Hamid says
2019-03-16 at 18:04
Hey Tim,
Thanks for this great post,
I’m looking for a GPU do to my own research and I’m thinking of price range
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 93/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
$2200-$2500. Given that a while is passed may I ask what would you choose for 1.
2080 GPU (Ti?) 2. Motherboard, 3. Ram, 4. Hard disk 5. PSU and 6. Case?
I mainly use this for deep learning.
Thanks
Reply
Do you have advice on how bad an idea it is to have two different GPUs in the
same box? I have an old GTX 980 that it seems a shame to waste so was thinking of
running that alongside a RTX 2070 in a 2 GPU setup. I would potentially use the
980 for prototyping things while the 2070 is off training. Thanks!
Reply
That is usually just fine. Make sure that you use software that is precompiled for
different compute architectures (different GPU series) and you should have no
problem.
Reply
Farooq says
2019-02-24 at 01:30
Hi Tim,
Extremely helpful article ! Keep it updated please !
I wanted to take your opinion on buying a single GPU e.g. GTX 1080-Ti today
priced around $808 vs buying two GPUs e.g. RTX 2070 (single GPU priced $527,
total = $1045).
Will 2 GPUs (RTX 2070) perform better as compared to single GPU RTX 2070?
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 94/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Generally, does two slightly slower GPUs perform better for Machine learning
projects as compared to a single high speed GPU?
Reply
Usually, two GPUs that are slightly slower are better than a big GPU, because
you can run multiple hyperparameter configurations of the same network on a
GPU. Parallelization is also an option and is usually slightly faster than one big
GPU. So go for the RTX 2070s!
Reply
rocco says
2019-02-20 at 22:39
Hello Tim,
I am going to use Asus WS X299 SAGE motherborad with 2X Rtx2080Ti. If I use dual
fan GPUs (2x Rtx2080ti), Is it better compared blower style fan GPUs ? Actually, I am
afraid of use blower fan GPUs for heating problem. Using two GPUs, I think there is
enough space between them.
Reply
Yes, if you have space between your GPUs a dual fan will be fine, but probably
comparable to the blower fan.
Reply
Diego says
2019-02-20 at 12:14
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 95/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim
Thank you for your guidance, for those interested in the development of AI.
I have a big question and it is the following:
I want to buy an MSI Vortex G65RV with the following characteristics:
Processor: Intel Core i7-6700K 4.0GHz 8M Cache, up to 4.20 GHz
Hard Disk: 1 TB (SATA) 7200 + 256GB SSD PCIe 3.0
RAM memory: 32GB DDR4 2133MHz expandable
Graphics Card: 8 GB Nvidia GeForce GTX 1070 DDR5 VR Ready
Connectivity: Killer ac Wi-Fi + Bluetooth v4.1, 2 ports Killer Gb LAN
2 x Thunderbolt 3, 2 x Mini Display Port, 2 x USB 3.1 Type-C.
Unfortunately, the desktop only reaches: CPU i7-7700 and GPU 1080.
I know it will not yield the same to a card connected internally on the board. but I
would like to know how the GPU communicates in those cases since the connection
is through the Thunderbolt port if I am not mistaken, and I would also like to know
how it will be the DL redeeming with CPU i7-6700k and GPU RTX-2070 with
connection Thunderbolt
Thank you.
Reply
Thunderbolt 3 is pretty good for communication and you should see only a
small loss in performance (10%) for most application. This number might be
higher if you (1) have very large input data, (2) a small neural network — this is
not very common. So an eGPU should be fine.
Reply
Michel says
2019-02-15 at 08:37
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 96/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hello,
I am no expert in deep learning but the gaming community tends to consider that
going from an rtx 2060 to an rtx 2070 brings little benefit in terms of FPS or detail
rendering, a just higher price.
I am wondering whether there is any reason why the 2060 is not mentioned in your
really great review of GPUs.
Thanks!
Reply
Reply
Alvaro says
2019-02-24 at 01:40
Also, consider the new GTX 1660 Ti! The “tensor cores” have been removed
but they’ve been replaced with FP16 units. I can only guess about the
actual performance compared to RTX 2060… it would be great to find
some actual tests. Could it be just as cost-effective after the retail price
stabilizes?
Reply
Michael says
2019-03-10 at 07:04
I would be interested in the comparison of the RTX 2060 and RTX 2070 for
deep learning applications.
Do you think it is worth going for the RTX 2070?
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 97/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Reply
Just connect them to your GPUs. It will barely impact performance of your
GPUs.
Reply
Hi Tim,
Would it be possible to setup a system with a 1080Ti and a 2080 Ti and use them to
perform parallel training?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 98/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2019-02-20 at 14:20
This does not work, unfortunately. You need the same chip architecture to do
GPU-to-GPU parallelization. You can do GPU-CPU-GPU parallelization, but that
often yields no speedups.
Reply
Thanks for the input Tim. I guess i will try to get the 2080Ti, but i keep
reading many reviews of them dying! So a little afraid to put down 1800$
(CAD)
Reply
Ari H says
2019-02-07 at 02:50
It’s just a shame that AMD’s latest GPUs would have potential to demolish NVIDIA’s
overpriced cards on deep learning if they just fully supported PyTorch. Currently
they’re about as good as doorstops unless you write everything with OpenCL
yourself. Their priority should be to get PyTorch working ASAP.
Reply
Agreed. There are some efforts to do this, but it is a delicate issue because
PyTorchs code-base was an older code-base which was built upon. I hope soon
they can figure out the last issues and then I would be happy to recommend
AMD cards as well.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 99/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I just wanted to hear your thoughts on the differences between the RTX 2080 Ti
founders’ edition vs the other RTX 2080 Ti’s with third-party hardware from ASUS,
Gigabyte, MSI, etc (or just the advantages of FE vs not FE cards in general). In my
country, the FE is about $300 USD cheaper so if the others do not have any real
advantages for AI I would prefer to go with the FE. Also, as I am considering water
cooling, the advantages from gained due to superior cooling may not be a
concern.
Thanks Tim!
Reply
Reply
Ando says
2019-02-01 at 12:59
Hi Tim,
Thank you very much for the guide.
I am trying to build my first DL machine. Following your advice, I am looking at to
start with an RTX 2070, I will add either another 2070 or an 2080 Ti later, and
maybe even a third one. This is my build, if you have time, please have a brief look:
https://pcpartpicker.com/user/ando_khachatryan/saved/yQkNQ7
My concern and question is about the cards: while looking for a blower-style card
on Amazon, I encountered lots of negative reviews for cards from different vendors,
and the vast majority were describing the same problem: card worked out-of-box,
then, after a week of gaming, artifacts started to appear and the games started to
freeze/crash.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 100/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Looks like a good build with some spare room for more GPUs.
I have heard about the problems. It is unclear still if all RTX have this problem
or if the first batch of RTX cards in the release had this problem. It is worth it to
look at the date of the reviews and see if it got better over time. I personally
have no problems with my RTX cards, but maybe I have been lucky so far.
Reply
Ryan S says
2019-02-01 at 08:34
Hello Tim, I have a few very fundamental questions. I plan on using an NVIDIA Tesla
P4 GPU on a server (let’s say Intel Xeon 16core, 128GB RAM, 2x10GbE etc.,). From a
popular manufacturer that has such a config, it states that the system can handle
video analytics (like face detection) on 9 concurrent video streams @ 720p/15fps.
My question is:
– If I run video @ 720p/3fps, how many concurrent video I may be able to handle
concurrently?
– If I run video @ 1080p/3fps, how many concurrent video I may be able to handle
concurrently?
I know there are many factors related, but just as a ballpark, any suggestion if
lowering the frame rate would help increase the # of video streams. Is this a linear
equation of any kind?
Thanks!
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 101/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I do not know what this is referring to exactly, but one assumption that could
be reasonable is to say it scales linearly, that means 9*5 streams for 720p/3fps
and 9*5/2.25 for 1080p/3fps, but I do not know if that works out. The best it to
ask the manufacturer yourself.
Reply
Bruce says
2019-01-27 at 14:14
Hello!
Maybe I am a bit confused, but can I have a config with more than one RTX 2070 at
the same time? Because 2070 don’t support SLI
(https://hothardware.com/news/nvidia-geforce-rtx-2070-gpu-will-not-support-
nvlink-sli-but-why). Does it matter?
Thanks in advance!
Reply
CUDA code cannot use SLI for communication for GPUs. Instead, GPUs
communicate via the PCIe network. Thus no SLI support is needed for
parallelism.
Reply
Nasi says
2019-01-24 at 09:04
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 102/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim,
I already have a GPU, 1080, in my PC. I am going to install GPU 2080ti along with
the previous one.
1) Is it possible to have two different types of GPUs in one PC and use them for
training a neural network, especially in tensorflow? I do not know how to prepare
the environment to use both GPUs. Is it possible for you to sent me a good tutorial
link for that?
2) There are two PCI Express slots on the motherboard, but they are too close to
each other. If I install the new one in the empty slot, the fan of the old one will be
blocked by the new one. So, I think I should either buy a new case (with a new
motherboard) or buy a PCI express riser. I found multiple links to buy a PCI riser, but
I do not know whether they are good or not. If I use a PCI riser, I will put the new
GPU outside the case, and I will not close the case. Could you please give me your
opinion about PCI express risers?
https://www.amazon.fr/Cablecc-Gen3-0-16-PCI-Express-x16-Extender-Up-
Angled/dp/B07GBRQPQF/ref=sr_1_17?ie=UTF8&qid=1548337647&sr=8-
17&keywords=pci+express+riser
https://www.gearbest.com/other-pc-parts/pp_672357.html?
wid=1433363¤cy=EUR&vip=4450235&gclid=Cj0KCQiA4aXiBRCRARIsAMBZG
z_d_R54eWGNs1vpAKV0qBtUDNK9MGw7HzNrLH4d5MFlfCpBMGC9s2IaAm4tEALw
_wcB#anchorGoodsReviews
https://azerty.nl/product/delock/670177/riser-card-pci-32-bit-with-flexible-cable-
left-insertion-riser-kaart?
gclid=Cj0KCQiA4aXiBRCRARIsAMBZGz9LzTqekoEhsr6sRVCwqBNfrWdTBVhDgzYHfu
7dNwTBOLBLCfgUn5caAumqEALw_wcB
https://www.amazon.com/Ubit-Multi-interface-Function-Graphics-
Extension/dp/B076KN7K5Q
Best regards,
Nasi
Reply
1) Yes, but you will not be able to parallelize a deep neural network across
those two different GPUs.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 103/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2) If one of them has a blower fan you might be able to put the RTX 2080 Ti
left of the other GPU. Otherwise, you can always buy a riser/extender if you
have overheating issues.
Reply
Ehtesham says
2021-09-11 at 05:01
I failed.. coz power supply goes down in 1 minute.. and windows gets
crashed and system get rebooted… Same exercise i tried with HIVEOS but
same result.
Today I will try to put a 1500W power supply along with 800W (already
installed supply) and shall see the results… Hope it works.. !
Reply
Thanks for sharing this! This shows how difficult it can be to get the
power requirements right
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 104/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Gurunath says
2019-01-21 at 21:23
While you have mentioned that PCIe lines don’t matter significantly for a <=4 GPU
setup, I plan to use a setup with 8 GPUs (RTX 2080) for NLP, speech recognition
tasks. Would the number of PCIe lines significantly affect the performance in such
applications? What would be your advice on the number of PCIe lanes for each
GPU in this 8 GPU setup for NLP and Speech recognition tasks?
Info: For example, our current NLP task on sequence-to-sequence model for a
batch of 100 sentences, each restricted to 128 tokens (each represented by a 64-bit
tensor) in Pytorch takes around 120-150 ms per iteration on a single GPU(1080Ti).
Thanks in advance.
Reply
If you want to parallelize across 8 GPUs the PCIe lanes will matter quite a bit
compared to 4 GPUs. The communication requirements scale linearly with the
number of GPUs (if you use the right communication algorithm). However, if
you run 8 GPUs on a regular 4 GPU motherboard you are also halving the PCIe
speeds and you will have 4 GPUs behind a PCIe root complex. Since only one
GPU behind a PCIe root complex can communicate with another root complex
it means you need 8x the time to sent the same amount of messages between
GPUs compared to 4 GPUs. So in total, the communication with 8 GPUs on 4-
GPU motherboard will be 32 times more expensive than 4 GPUs on a 4-GPU
motherboard. If you want to parallelize 8 GPUs efficiently, you will need 4 PCIe
root complexes and this often means 2 CPUs and server-grade hardware (EPYC
systems might be an exception, but I am not sure if those motherboards
support 4x root complex setups).
If you do not want to parallelize a network across all GPUs, you will be fine —
just note that with this system you cannot really do parallel training.
Reply
Eric says
2019-01-18 at 03:42
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 105/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim,
Thanks for this post. After reading it through I still a bit unsure about my PC specs
that I would like to get to run deep learning. Mainly because I don’t want to get
hardware that don’t work with other software/hardware.
Thanks a lot
Reply
Reply
Abdelrahman says
2019-01-10 at 14:55
Reply
The CPU will be fine for deep learning with a GTX 1060. However, if you want to
preprocess data might take more time with such a CPU.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 106/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
krzh says
2019-01-08 at 12:24
Reply
Yes, that looks quite good. The system would also work well with more than 2
GPUs so if you have any plans to use more than 2 GPUs you could get a
motherboard with more PCIe slots. Otherwise, all good!
Reply
Hello Everybody,
I have been trying to get the multiple Gpus to work on a Ubuntu system.
+—————————————————————————–+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|——————————-+———————-+———————-+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+=====================
=+======================|
| 0 GeForce GTX 106… Off | 00000000:02:00.0 Off | N/A |
| 6% 57C P0 26W / 120W | 321MiB / 6072MiB | 0% Default |
+——————————-+———————-+———————-+
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 107/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
+—————————————————————————–+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=====================================================
========================|
| 0 1165 G /usr/lib/xorg/Xorg 198MiB |
| 0 2066 G compiz 119MiB |
| 0 2746 G /usr/lib/firefox/firefox 1MiB |
+—————————————————————————–+
although the system shows that it has all the cards , but they dont get used even
when i try keras for mutliple gpu learning
if any other information is required for solving this , i can provide the same , i am
using the pcie bridge to raise the gpu and use them.
Reply
This should usually work. I guess the problem might be the PCIe bridge. It is
difficult to tell with this information and it is not straightforward to debug. If you
can use two GPUs without PCIe bridge and try again.
Reply
Nitin says
2019-01-18 at 20:36
I have used two gpus without pcie bridge , these 2 gpus are now mounted
on the motherboard , but i am still not able to use both of those gpus.
Tensorflow starts to use memory from both but does not use the second
one for processing.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 108/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Did you write code that utilizes both GPUs? You can try to run some
code which tests parallelism. There are some multi-GPU samples from
NVIDIA (CUDA samples) which test if parallelism between your GPUs is
possible. If this sample works it will be a software issue.
Reply
geek12138 says
2019-01-07 at 08:45
Hi Tim,
A titan rtx and two 2080ti which are more suitable, considering the memory
difference between 24g and 11g*2.
Reply
The computing power on two RTLX 2080 Ti is almost double that of the Titan
RTX. Thus its a question of compute vs memory. If you want faster compute go
with 2x RTX 2080 Ti if you want more memory go with the Titan RTX.
Reply
Neil M says
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 109/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2019-01-03 at 10:53
Hi Tim,
I’m building my first deep learning work station and based on your guide I’ve just
purchased an RTX 2070.
I’m re-purposing an existing older workstation as the basis for the build. The
specification of the machine is:
I want to use the existing GeForce 660 GPU to drive the monitors and keep the RTX
2070 solely for computation. Looking at the NVIDIA website both GPUs use a
common driver so I expect this will work. Do you forsee any issues or limitation with
this approach or my current spec? Thanks.
Reply
I think the system should work quite well with an RTX 2070. Some computer
parts are older thus some parts of common code, like preprocessing, would be
slower, but your deep learning performance should be close to what other
people report with modern desktops.
Reply
Peixiang says
2018-12-27 at 16:36
Can I use two different GPU at the same time? Say 1080Ti and 2070? What are the
issues I may encounter?
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 110/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Shayan says
2018-12-23 at 16:00
Hi Tim,
Can you please comment on what type of setup is used in this video
[https://www.youtube.com/watch?v=RFaFmkCEGEs&t=54s], at 0:47 seconds you
can see he has 4 nvidia GPUs by using the nvidia-smi command, however he is
using a macOS.
Also would you recommend using macOS (w/ gpus) for competitions.
Kind Regards
Shayan
Reply
These are GTX 1080 Ti GPUs. There are some compatibility issues with macOS
that only certain NVIDIA GPUs are supported, but I do not know the details.
For this, I usually do a google search on reddit: “site:reddit.com which NVIDIA
GPUs work for macOS deep learning”
Reply
Fiz says
2018-12-21 at 16:27
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 111/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
do you mean
(1) on RTX cards running 16-bit models indirectly doubles the available memory for
deep learning compared to 32-bit models, is that correct?
(2) the facts in (1) are not valid for GTX cards, i.e. 32-bit models and 16-bit models
makes no difference?
(3) how to explicitly run in 32 vs 16 models when deep learning? Are there
examples?
Reply
1. It is not a straight doubling, but the memory requirements are much lower.
2. You can have 16-bit models with GTX cards, but what happens under the
hood is that all values will be cast to 32-bit before any computation. So the
weights are 16-bit and the computation 32-bit for GTX cards. However, you
should also see a good reduction in memory if you use 16-bit weights with GTX
cards.
3. In PyTorch is can be as simple as “model = model.half()” and you will run in
16-bit mode. In practice, it can be a bit more complicated depending on the
model that you are running. You can have a look at NVIDIA’s 16-bit
library Apex that is built on PyTorch for more sophisticated examples.
Reply
Phil says
2018-12-20 at 05:48
Hi Tim,
Can you please tell me if I am doing something wrong. The idea is to run LSTMs
with many datasets that are rather small (<5Gb). I will have several GPU not to
parallelise but to run different optimisations at the same time. I am not a hardware
expert and I want to make sure that I don't waste GPU power because of a poor
setup. If it goes well, I will replicate it to populate a rack.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 112/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
– WD 4To
– Corsair AX1600i (1600W)
Phil
Reply
The RTX 2070 cards that you chose might be prone to overheat in that
configuration. I would pick a blower-style RTX 2070 card instead. Otherwise a
good build. I am not sure though if you can easily find 3U or 4U racks that fit
well with this configuration.
Reply
Phil says
2019-01-01 at 07:02
Reply
Saurabh K says
2018-12-19 at 19:16
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 113/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2. Intel Core i7-7800X X-Series Processor (28 PCIe lanes for possible future
expansion):
https://www.amazon.com/gp/product/B071H1B3Z1/ref=crt_ewc_title_dp_1?
ie=UTF8&psc=1&smid=ATVPDKIKX0DER
The motherboard seems pretty expensive, any other suggestions for the
motherboard compatible with i7-7800x? The reason I am going for that one is
because it supports wireless LAN. Else MSI X299 RAIDER LGA is a pretty good
option (https://www.newegg.com/Product/Product.aspx?item=N82E16813144059).
Any thoughts?
Reply
Other motherboards that do not support WLAN are fine as long as you get a
USB wifi adapter. This combination might save you a bit of money on the
motherboard. The i7 is a very versatile CPU — a bit expensive but it will show
strong performance in any case!
Reply
Hi Tim,
Can I add an RTX 2080ti to my existing 2 GTX 1080ti to improve the training time
for voice recognition application?
Reply
Matt says
2018-12-17 at 12:01
Beginner who had some perishable credit on electronics vendor with limited
computer parts supply. Got my hands on a evga rtx 2070 xc 8GB and a full atx case.
Started the fastai course but the workflow with cloud resources got too annoying.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 114/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
So i have my hands on the rtx 2070 already. Don´t want to waste too much of its
capability for most beginner/intermediate use cases but do not have that much to
spend.
the impression i get from the latest guide update is that even a i3 7100 or g4560
would not hamper the gpu or only slightly (and those cpu are really cheap). Have i
understood it correctly?
Reply
Yes a single RTX 2070 should be easy to utilize with an i3 7100. However, if you
preprocess a lot of data you might still run into some bottlenecks. If you make
sure that you have good quality preprocessing code you should be fine.
Reply
Nazim says
2018-12-17 at 06:12
Reply
Reply
Nazim says
2018-12-28 at 05:19
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 115/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Hi,
Thanks for the detailed post and also the enlightening discussions. I have prepared
two builds which I am sharing below. Can you please provide your suggestions. I
have some specific questions regarding the builds which I am providing with the
build configs. But, before that let me provide my requirements/type of ML works I
intend to do.
1. I am a very new assistant professor at an Institute in India. I will eventually get the
money to procure 2 workstations with 4 GPUs (I am at least hoping 1080Ti) in each.
But, that will take time and I want a decent build to get the ball rolling. For that I
have already bought 2 1080Ti GPUs and a Samsung 860 EVO 500 gb around 3
months back when I was in the US. So, they are sitting idle now. To avoid this, and
to get started, I want to buy the other parts of a DL machine from my pocket. My
budget is around Rs. 100,000 [Rs is the Indian currency].
2. The machine will be in the server room of the institute. So, the cheapest cooler
[whatever noise level] and cabinet is what I would prefer.
3. My student [only one at this moment] will run RL codes [both training and
inference] on images. Later, I might do some classification work on videos [but this
is a distant possibility at this moment, and I might be able to procure the servers
with 4 GPUs by then].
4. I don’t plan to expand this machine beyond 2 GPUs. My long term plan is to
make this a student machine that will have even 1 GPU and the student can
develop/prototype codes here while the stable code would run in the 4 GPU
servers.
5. My builds provide some prices which does not have web links. This is
from https://mdcomputers.in — a local but reputable vendor. I could not find how
to link their product pages to pcpartpicker. Otherwise, I would have done that. So,
you have to believe my words about the price.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 116/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
the system, ultimately, becomes a student’s developing machine with one GPU.
Build with core i5 9600K — https://in.pcpartpicker.com/user/dasabir/saved/fgQ299
Build with core i7 8700K — https://in.pcpartpicker.com/user/dasabir/saved/bD7J8d
While looking for the option of core i7 8700K, I came across core i7+ 8700
[https://mdcomputers.in/intel-core-i7-8700-bo80684i78700.html]. I see that this will
cost me Rs. 11,000 more over my core i5 9600K build. I am not sure what is the
difference between an i7 8700K and i7+ 8700 (other than the frequency/speed).
Here is the comaprison link — https://ark.intel.com/compare/126684,140642 . Will
i7+ 8700 require different motherboard? It says the box includes NVME 3.0 x 2,
does it help me? Also the i7+ processor includes a 16 GB optane memory. Will it be
of any help (e.g., keeping the OS there)? Also does optane memory occupy PCIe
lanes? Any suggestion on this would be great to have.
My second build is with AMD processors. I tried with AMD Ryzen 7 2700X. The
price is coming around the same as the core i5 9600K build. It does have 8 cores
compared to 6 cores for the intel processors, but does AMD have hyperthreading?
I am not sure. Also it does not have MKL, is intel MKL going to be crucial for deep
learning?
Build with AMD Ryzen 7 2700X
— https://in.pcpartpicker.com/user/dasabir/saved/3ddTBm
Though you say number of PCIe lanes are not that important especially with 2
GPUs, I just tried my luck with an AMD threadripper processor. As expected, it is
overbudget. But, if you say, it is worth spending this much money, I might also go
for it.
Build with AMD Threadripper 1900X
— https://in.pcpartpicker.com/user/dasabir/saved/73mhyc
Abir
Reply
Either build is fine. You could buy a bit cheaper RAM which lower speeds — it
will not make a big difference. If the ADM build is too expensive and you run
only 2 GPUs I would rather good with an i5 or i7 build.
Reply
Hi Tim,
Thanks a lot. The AMD Ryzen 7 2700X build is the cheapest. So I will go
with this. I tried to see low speed RAMs, but that is not saving me much.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 117/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Zhenlan says
2018-12-16 at 17:19
Hi Tim, Thanks for the great guild. I found you blog super helpful when I built my
first box in 2016. And it is still worthwhile to come back for updates.
My computer would freeze every now and then so I have to force restart. And even
before it freezes, it would get slower by epochs. First epoch would be 90s, next
would be 120s and it just gets worse. It also gets slower and more likely to freeze as
I experiment more network structure by defining more models(clear_session() or
tf.reset_default_graph() does not help)
I use “top” to monitor cpu/RAM, either of which seems to be the problem. I use
something like “watch -n0.1 nvidia-smi” to monitor GPU. GPU utilization stays above
90%. But it does not really tell me much about memory as tensorflow automatically
allocates almost of the GPU memory at start. I tried tf.configproto() to limit GPU
memory used by tensorflow without much luck.
Do you have any suggestion as to how to diagnose this issue? Thanks in advance
and happy holidays!!
Best,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 118/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Zhenlan,
Reply
It sounds like you have a memory leak somewhere in your code. First, check if
you run out of CPU RAM and your computer is swapping RAM to disk. If that
does not help try to debug TensorFlow further. If that does not help it could
help to install the newest NVIDIA drivers. If this does not help try PyTorch and
see if that works for you (PyTorch is much easier to debug in these cases).
Good luck!
Reply
david says
2018-12-16 at 13:36
Hi, I am a Computer Scientist but I have not done any project on DL before. Maybe
later I will buy RTX Titan but not in the next three months. Could you please let me
know the following?
1. Given a model and if I want to see how it behaves under different initial
parameters, will there be a problem if my desktop has two GPU of different kinds
(e.g. one GTX 1060 and one RTS 2080/2080Ti or RTS Titan)?
2. Am I correct that only when I do parallel training of the same network with the
same set of initial parameters will I need to have GPU of the same model?
3. Those 20×0 cards have Tensor Cores in addition to CUDA Cores. Are Tensor
Cores helpful in speeding up the training in Tensorflow? What else is it good to buy
RTX card now rather than GTX card?
Reply
1. You can use different GPUs for different networks. However, if you want to
parallelize a single network across both GPUs they need to have the same chip,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 119/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
George M says
2018-12-16 at 11:43
Tim, thanks for updating this. Long term I am hoping to build a dual RTX 2070
system to allow for data parallelism. Would hooking up one monitor to each GPU
be a viable option? Also, in that case would the “coolbits” option be able to control
each GPU fan, or will fan control still be “hard and hacky” as you put it?
Reply
Brendan says
2018-12-14 at 07:23
Hey Tim,
First I want to thank you for this blog, it teaches a man to fish rather than giving
him a fish as the old aphorism goes.
I have a few questions related to hardware that I’m a little unclear on, and also that
are pertinent as PCIe 4.0 slots are rumored soon. First a little background on my
build, I’m going to be building a computer primarily for statistical computing before
I begin a doctorate program in stats/applied math. This means it will first need to
be good at serial processing which is why I’m entertaining CPUs that are overkill in
terms of CNN needs (the type of neural networks I will be using). I do want it to be
able to do CNN work as I am intrigued by and play around with that somewhat.
2) Should I wait for my build until PCIe 4.0 motherboards are released? All I see
now are rumors but it is rumored that they will be released next year in 2019. Given
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 120/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
that DMA is the main bottleneck here, wouldn’t it be beneficial to wait until PCIe 4.0
is available (31.508 GB/s more than doubles the performance of PCIe 3.0)?
3) If that is true, then would RAM memory speed actually be a factor? I don’t think
so, as you stated your current setup gets over 50 GB/s for the RAM which would
still be above the DMA bottleneck of PCIe 4.0
4) Even if PCIe 4.0 motherboards are released in 2019, they would still be
compatible with the processors I mentioned above, correct? If so, then building my
rig now wouldn’t hamper me as I could just upgrade the motherboard once PCIe
4.0 compatible motherboards are available. Is that right or are we unsure if there
will be LGA1151 compatible PCIe 4.0 motherboards?
5) I’m looking at a GTX 1080 ti and a RTX 2080 ti for my GPU. I think the RTX 2080 ti
is a little outside my price range, but I would be debating between a GTX 1080 ti
with a water cooling block setup or an RTX 2080 ti without water cooling. Which do
you think will likely perform better as the temperatures will likely hamper
performance of the GPU with the stock fans?
Thank you again for this post and for your continued answering of questions in the
comments. If you have time, I would greatly appreciate a response!
Cheers,
Brendan
Reply
Hi Brendan,
1) The Ryzen 2700X would be fine for up to two GPUs. If you want more GPUs I
would look for a different CPU.
2) PCIe 4.0 will not help much with deep learning and I would not wait for it.
3) Memory speed is not much of a factor. I would just buy cheap RAM.
4) This is determined by the motherboard. For PCIe 2.0 -> PCIe 3.0 we saw that
the new motherboards often supported only the most recent CPU sockets. I
believe this could be the case for the new PCIe 4.0 boards too.
5) A single RTX 2080 Ti on air should be fine. You should see a slight
performance decrease but it is still faster than the GTX 1080 Ti. If you do not
have the money for an RTX 2080 Ti, both a water- or air-cooled GTX 1080 Ti
should be a great option.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 121/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Your statement in 2015, is it still true with current frameworks? I used pytorch with
fastai and all threads and cores are maxed out usually ( image training in resnet34) :
Reply
However, some frameworks also use quite a bit CPU in the background like
TensorFlow. I do not have the deepest insights into this, but TensorFlows graph
pipeline is quite sophisticated and might need more CPU cores to process
efficiently. The benefits for PyTorch would mainly lie with background loader
Threads.
Reply
Hi Tim,
Sorry for missing this point.
Is a chipset with dual x16 pcie 3.0 that is compatible only with 16 pcie lanes cpu
Is this motherboard equivalent to a chipset with single x16 pcie 3.0 slot or dual
x8/x8?
Is it equivalent to a chipset with dual x16 pcie 2.0?
Thanks
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 122/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
16x slots and lanes for the CPU are different things. If you have 2 PCIe 16x slots
and CPU with 16x lane that would be perfect for a 2 GPU setup!
Reply
Hi Tim,
Thank you for your great guide.
Are the following components sufficient for 2 GPUs system with their full power?
Motherboard: ASRock Z270 PRO4 LGA1151/ Intel Z270/ DDR4/ Quad CrossFireX/
SATA3&USB3.0/ M.2/ A&GbE/ ATX Motherboard.
Power Connectors 24-pin main power connector, 8-pin ATX12V connector
HDD: WD Blue 1TB SATA 6 Gb/s 7200 RPM 64MB Cache 3.5 Inch Desktop Hard
Drive (WD10EZEX)
like:
Antec EarthWatts Gold Pro 550W Power Supply 550 Watt 80 Plus Gold PSU with
120mm Silent Cooling Fan, Semi Modular, 7 Years Warranty, 99% +12V and ATX12V
2.4 – EA550G PRO Black
or 20+4 like:
Antec EarthWatts Gold Pro 550W Power Supply 550 Watt 80 Plus Gold PSU with
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 123/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
120mm Silent Cooling Fan, Semi Modular, 7 Years Warranty, 99% +12V and ATX12V
2.4 – EA550G PRO Black
Thank you.
Reply
Sorry I am not able to look at a build in full detail. If you can narrow down your
problem to a single question I might have time to answer.
Reply
Thank you for sharing this article…You explain every thing very well in article as well
as in comments also..It is very helpful for me as I am preparing for hardware
courses so I used to search these things and I found that your blog is simply
awesome among all..Thank you once again…Waiting for your new article…All the
very best..KEEP WRITING
Reply
Hello Tim,
I am adding these questions to the list of questions mentioned above ( just a
reminder):
+ I am going with a 2x 2080ti setup for now, and I am going to be expanding in the
future to 4x 2080ti. However, would I benefit from using NVLinks? If so, how is it
going to impact things? For example, would I be able to double my memory?
Would it it affect other bottlenecks?
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 124/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Thanks
Reply
(1) NVLink on RTX cards is currently limited to two GPUs only and it will not help
you too much in that scenario for data parallelism. For model parallelism, it
could help, but currently, there are no models and code that profit from that.
So it might be useful in the future, but not right now.
(2) 1800 Watts is a bit much. You need about 275 per GPU and another 300 for
the CPU which is about 1400 Watts. If you get a 1300 to 1600 Watts PSU you
should be fine I think even if you overclock. There are some nice ones from
EVGA; you can find them easily if you search newegg.
Reply
Thank you so much! One more thing, can you please help me with the
questions from the previous post? I will post the questions here:
”
Just a quick clarification on your reply earlier, I am planning on expanding
later to 4 x 2080 TI. So, in this case I would go with Rampage VI Extreme &
(9920x or 9940x depending on price) and 2x2080ti. However, the issue is
that I am concerned about the number of lanes going into GPUs. Both of
these CPUs has 44 lanes, which is not enough to run 16 lanes on each GPU.
Does it matter (16 vs 8 lanes)?
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 125/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Also, you addressed this before, but I just want to confirm, CPU clock is
irrelevant. Basically, I am not losing anything by going down from 9900k @
5.3 GHz to 9920x @ 4.7 GHz to even a threadripper 2950x @ 4.4GHz.
right?
Also, is there a difference between using an intel cpu vs amd? (Sorry if this
seems like a very broad question, but I’m not sure which way to go; intel
has higher clocks and worst value-for-money, while amd has better value
and more lanes)
Thank you again for being patient with me! Choosing the build
components has been a steep learning curve for me. I am really glad that I
found someone to point me to the right direction.”
Thanks Tim!
Reply
Mohammed says
2018-10-13 at 13:38
Another option is to go with threadripper which has 60 lanes (again not enough for
4×16), but at least I might be able to run 3×16 + 1×8.
Also, you addressed this before, but I just want to confirm, CPU clock is irrelevant.
Basically, I am not losing anything by going down from 9900k @ 5.3 GHz to 9920x
@ 4.7 GHz to even a threadripper 2950x @ 4.4GHz. right?
Also, is there a difference between using an intel cpu vs amd? (Sorry if this seems
like a very broad question, but I’m not sure which way to go; intel has higher clocks
and worst value-for-money, while amd has better value and more lanes)
Thank you again for being patient with me! Choosing the build components has
been a steep learning curve for me. I am really glad that I found someone to point
me to the right direction.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 126/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Mohammed says
2018-10-10 at 18:26
Thank you Tim for such a great guide! I have a question about the asynchronous
mini batch allocation code you mentioned. I am using python mainly through Keras
and sometimes Tensorflow. How can I do the asynchronous allocation? Also, I am
not familiar at all with cuda code, but how hard is it to learn? And, is there a way to
integrate cuda code into my normal use (python amd keras)?
Reply
This blog post is a bit outdated. It seems that TensorFlow is using pinned host
memory by default, which means that you are already able to do asynchronous
GPU transfers. While I stressed it in the blog post, its actually not that big of a
bottleneck for most cases. For large video data, it could have a good impact.
Reply
Mohammed says
2018-10-11 at 03:19
GPU: 1 x GTX 1080 Ti + 1 x RTX 2080 Ti (I might add a third card depending
on your recommendation)
Cooling: Liquid (open loop)
PSU: RM1000x ( I might go abit higher in terms of OC quality (higher tier)
and power delivery (targetting 60-80% at full load for best efficiency, where
expected maximum load is ~800W). I am considering getting an AT1200x
instead
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 127/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Option B : (Quad Memory Channel but Lower CPU Clock and I think
overkill core count)
CPU: 7920x with 16.5 MB cache (not sure but estimated @~4.7GHz for all
12 cores)
Motherboard: Rampage VI Extreme
RAM: 64 or 128 GB (depending on your recommendation) again (OC to
4000MHz+)
Thanks
Reply
Mohammed says
2018-10-12 at 05:37
Reply
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 128/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Mohammed says
2018-10-12 at 09:52
So, your point from the article about (RAM clock) is not outdated.
In other words, is RAM clock irrelevant because of asynchronous
mini batch allocation?
What about data cleaning and pre-processing? Does the same
logic apply?
A good RAM clock will not help you pre-process much faster. This
video puts it quite well: https://www.youtube.com/watch?
v=D_Yt4vSZKVk
Angel G says
2018-08-03 at 19:35
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 129/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim, I’ve re read the comments and few questions rouse in my head.
1. How it has been decided that 16-bit floating point numbers have enough
precision for the neural networks – doesn’t it reduce their recognition abilities ?
2. In 2010, I trained a C++ coded CNN. I’ve noticed that if I run in more than 4
parallel threads it’s learning rate decreased (required more epochs) . The weights
were updated concurrently by the threads using non-blocking(mostly) atomic
cmpxchg64 instructions. I’ve skipped all development until now.
Now massive parallel architectures are used (in GPU) so I wonder how do they
update/combine the weights in parallel without destroying the learning rate ?
3. Does CUDA vs AMD matter if I implement the neural networks in the old school
manner – without any SDK – Open GL shading language with floating-point
textures.
Reply
Reply
P says
2018-04-22 at 01:08
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 130/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I guess lots of access are required at the beginning to load the data and only a
small amount of I/O access is required to save the learning weight. Am I correct?
Reply
It is mostly loading data and an SSD is only required for performance if you
have very large input sizes. If you do not have that, you will gain no
performance over using a spinning hard disk. However, besides DL you use the
SSD for many other tasks, so if you have the money I would definitely go for
SSD drives to make I/O work more comfortable.
Reply
P says
2018-04-24 at 05:21
Is it worth to pay more to get the Nvidia 1080 rather than the 1070ti?
Reply
Leorexij says
2017-12-06 at 16:24
Hi Tim,
one place to collect all excellent reference . It really helps people better configure
their machines to perform efficient deep learning.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 131/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I’d like to ask a question and would be grateful if you can help me.
I am going to use 2000 images at a time and I want to use tensorflow and theano
framework in pyhthon. Can you advise me the configuration to achieve this with
good performance.
And my budget is less than 50,000 INR
Reply
Abdelrahman says
2017-11-25 at 22:13
Hi Tim,
Thanks for the great article. I am planning to use 6850k for my deep learning box,
with 1070 Ti, 1080, or 1080 Ti GPUs, planning to extend to 4 GPUs later.
I just wonder if the following motherboard is a great option for deep learning box
(4 GPUs): MSI Extreme Gaming Intel X99 LGA 2011 DDR4 USB 3.1 Extended ATX
Motherboard (X99A GODLIKE Gaming )
https://www.amazon.com/MSI-Extended-Motherboard-X99A-
GODLIKE/dp/B014VITZPM/ref=cm_cr_arp_d_product_top?ie=UTF8
Reply
Hi Tim,
On a 4x GTX 1080ti system, with 1x SSD for Windows 10 Pro, and 3x HDDs in RAID 5
for mass storage (encrypted with Bitlocker), in a secure multiuser environment, I’m
looking for an effective approach to separate storage from compute: I currently
have to reconfigure access rights in the RAID volume for the users, every time the
OS is reinstalled (see clean installation after breaking things).
I think it would make sense having a type 1 (bare metal) hypervisor to allow for
Windows and Linux VMs to access the hardware as needed. I’m considering a VM
for NAS, and two more VMs for Windows and Linux.
Do you know if this is possible abdomen, if so, which hypervisor allows for CUDA
from VMs running Linux and Windows to access the GPUs? Is there a particular,
tested software configuration that you can recommend?
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 132/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Thanks in advance.
Reply
Do you know if this is possible and, if so, which hypervisor allows for use of
CUDA from VMs running Linux and Windows, to access the GPUs? Is there a
particular, tested software configuration that you can recommend?
Reply
Levent says
2017-11-07 at 10:03
Hi Tim,
ASUS Z10PE-D16 WS as the motherboard. It’s obvious that we can’t fit more than 3
GPUs on this mobo, but what about using ribbon extender cables and hanging the
GPUs, just like “mining rig” people do?
Do you think this will be a good idea, as this mobo has 4 x PCI-e x16, and 2 x E5
26xx will have 80 PCI-e lanes?
Reply
Julien says
2017-10-31 at 09:29
Hey Tim,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 133/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Sorry if I missed this point, but suppose I only plan on having 2 GPUs max. Would a
16 PCIe lane CPU work given that each GPU utilizes 8 lanes?
Reply
Reply
Shahab says
2017-10-28 at 18:42
Hi Tim,
Thank you so much for your great article. It really helps people better configure
their machines to perform efficient deep learning.
I’d like to ask a question and would be grateful if you can help me.
At this moment, I have a GTX 1080 Ti with an Intel Core i5 6500 with 8 GBs of ram.
My question is that, is it worth upgrading CPU and ram to a Core i7 7700 (or 6700)
and 16 GB, respectively? Would be any boost at all if I do this upgrade?
Reply
Clément says
2017-10-09 at 12:59
Hello,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 134/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
new_dl_learner says
2017-09-29 at 15:45
Hello Tim, what do you think of using the Threadripper compared with the i7
7700K, i9 7900X or 7940X? Two main concerns are: 1) there are user reviews saying
that under Linux, there are bugs with PCIe. Not sure if I will encounter such bugs if I
install 1-4 Nvidia GPU. 2) Lack of motherboard that supports PCIe 3.0 x16x16x16x16.
Some mentioned that there is no noticeable difference between x8 and x16. I guess
they talked about the frame rate for gaming. Not sure if this conclusion applies to
deep learning. Any idea?
Reply
new_dl_learner says
2017-09-30 at 15:47
Hello, I have spent too much time on hardware selections. It is driving me nuts.
I need some help. My current laptop computer is almost 10 years old. I am
building a desktop replacement. I also want to use it for DL/ML research. As far
as I know, latest CPUs such as the i7 7700K, i9 7900X and Threadripper do not
support PCIe 3.0 16x16x16x16. Motherboards that support such quad PCIe 3.0
at 16x each only support older CPUs with LGA2011-v3 socket or the Xeon E5.
These CPUs are at similar or higher price range than the ones mentioned
above. Moreover, they are running at lower speed. I guess the dilemma is:
Spending more on older technology for quad PCIe 3.0 16x16x16x16 vs.
spending less on the latest CPU that only support two PCIe 3.0 running 16×16.
Any suggestion appreciated. Thanks.
Reply
Dear new_dl_learner,
The X99 motherboards with PLX chips (for quad PCI-E 3.0 x16) are no
longer produced, they are out of stock in most places, hence, no delima
here. As Tim has explained earlier, the CPU is not as important for Deep
Neural Networks as the GPU. Also PCI-E 3.0 x8 is not much worse than x16.
If you start with anything less than a Quad GPU setup, by the time you
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 135/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
need faster data throughput new CPUs and GPUs will be available, sporting
PCI-E 4.0, so you won’t be limited by PCI-E 3.0 x8.
Reply
new_dl_learner says
2017-10-01 at 00:42
Dear Nikolaos,
Thank you for the useful information. My laptop computer is i7
2.66GHz (8GB 1067 MHz DDR3, NVIDIA GeForce GT 330M with 512 MB
GDDR 3). Will that be sufficient for me to use it to learn about Deep
Learning and do some work in this area before PCI-E 4.0 comes out? If
not, what hardware do you recommend during this transition period?
Reply
You can start experimenting with Deep Learning using the CPU of
your existing laptop and a compatible library. For example,
Tensorflow is available for CPU. Start with the basics.
When you reach the point where you need faster compute
capability, depending on your budget, you can put together a PC.
At that time, you will know what the requirements of the software
and of the models that you are using will be, so choosing the right
hardware will be much easier than it is today for you.
new_dl_learner says
2017-10-01 at 14:07
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 136/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Yes, Tim mentioned that CPU is not that important but it was in
2015. Not sure about now. Some people mentioned that DL/ML
applications such as Tensorflow take advantage of multi-core,
multi-thread CPU and recommended getting at least 8 cores. My
laptop is showing signs of failing. Besides DL/ML, I also run
engineering applications that would benefit from higher clock
CPUs.
Hence, I think the bottom line is: use whatever system you can
afford and justify for your education right now. By the time you
need faster hardware, you’ll know what you need, what you need
it for and what options are available at that time.
new_dl_learner says
2017-10-01 at 21:42
Thanks. Given the timeline, I better save the money than build a
top of the line 4-GPU system now.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 137/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reagan says
2017-09-29 at 15:33
Tim,
Great blog, this is my second post. You mentioned previously that of all the
characteristics of GPUs that RAM size and bandwidth were most important. I
haven’t seen anywhere on your blog you mentioned the relationship between
CUDA-cores and DL speedups. The meat of my question is, what is the real
difference between the 1070 and the 1080 as they both have 8GB of GPURAM? I’m
considering buying a 1070 for the cost savings over a 1080 for my toy DL rig.
Reply
Reagan says
2017-09-29 at 17:04
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 138/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I pretty much answered my own question. The 1080 is only $70 or so more
expensive than the 1070 for 500 extra cuda cores and more bandwidth. I’ll go
for it.
BUT, my next question is really important. Quad-channel MOBO a must?
Reply
new_dl_learner says
2017-09-28 at 06:07
Reply
If you do not use GPUs this might be a sensible investment, otherwise, it will
not that be important and I would not select a CPU based on this feature
alone.
Reply
new_dl_learner says
2017-09-29 at 15:28
Reply
James says
2017-09-24 at 17:44
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 139/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim,
Great info, thanks. I was wondering if you knew about comapnies that could offer
this service? I know NVIDIA used to build dev boxes but stopped? This would help
me focus more on dev and not worry too much about building the machine.
Thanks,
Jame
Reply
There were some deep learning desktops from other companies, but I cannot
find them on Google. I think some of them might be buried in the comment
section somewhere, try to search that. Other than that, you could also just buy
a high-end gaming PC. Basically, there is no difference between a good deep
learning machine and a high-end gaming machine. So buying a high-end
gaming machine is a perfect choice if you want to avoid building your own
machine. I would still recommend giving building your own machine a shot —
it is much easier than it looks!
Reply
Jame says
2017-09-28 at 23:34
Hi Tim! Thanks for that. Will look around the comments. If i don’t find any
other solution i will have to start learning…The problem is the size that we
are looking at, maybe too big/challening for a single person,
Thanks!
Reply
Reagan says
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 140/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2017-09-11 at 00:05
Tim,
What do you think about using the Xeon E-5 1620v4 instead of the i7-5930K for a
quadGPU machine? The Xeon is half the price of the i7, also has 40 PCIe lane
support, and has a higher memory bandwidth and is the same socket type.
Is there something about server chips I’m not seeing that would interfere with me
using this chip on an X99 mobo?
Also, is there a difference between DDR3 and DDR4 RAM that is meaningful to
deep learning?
Great blog!
Reply
The Xeon is definitely a better option here. It has less cache and fewer cores,
but this should only have a minor influence. The chip should work normally on
a X99 mobo. For deep learning there is a very minimal difference between
DDR3 and DDR4. Probably the performance difference would be a few percent
which should not be noticeable unless you run the GPUs 24/7. However, if you
want performance for a 4 GPU setup, then the first thing you should look into
is cooling, in particular, liquid cooling. Other factors are insignificant.
Reply
Hello Tim,
Thanks for this wonderful resource for Deep Learning DIYers. Based on this and
several other resources on the internet, I have built my first A.I. ‘rig’ on which I am
training a Image Captioning/Transcription/Translation Neural Network – Im2Latex:
to convert Latex generated images back into the original latex markup. I have an
convnet of about 14M parameters and my Conditioned-Attentive-LSTM has about
8M parameters. I’ve been running this on Google-Cloud-Platform before I built my
own ‘rig’ and am happy to report that my rig with one GPU trains almost twice as
fast as the one with one virtual CPU on Google (i.e. half a K80). I think I can make
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 141/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
mine a little faster with some BIOS settings – but am happy with it so far. Am
ordering another 1080 Ti soon. Oh yes, I haven’t yet overclocked my GPU but it
naturally runs at over 1900 MHz on load with the help of Nvidia X Server Settings
app on Linux (temperature is 50 degrees C with inbuilt liquid cooling with the CPU
at 30 degrees C)
In the spirit of giving back to the community, here’s my parts
list: https://pcpartpicker.com/user/Sumeet0/saved/#view=gFbvVn. Also, while I do
have a copy of Windows 10 I decided to use Ubuntu GNOME 16.04 LTS – mostly
because I’m very comfortable with Unix like operating systems since I’ve worked
on/with those for over 20 years. One problem with Linux though is that most
software utilities for overclocking and system monitoring run on Windows. As you –
and other resources on internet – say, the best way to overclock a GPU on linux is
to flash the BIOS. At the least that’s not convenient – especially for a newbie.
Thanks for your feedback — this is very useful for everybody here!
Reply
Thanks. Is good to know that I’m not missing out too much by not
overclocking (and that my choosing linux over Windows 10 didn’t cause me
a major performance disadvantage – since I could have very easily
overclocked Windows). Now I can focus on training my 23M parameters
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 142/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
new_dl_learner says
2017-09-04 at 21:27
Thanks. Some sites suggested 8-16GB RAM but I found recent posts suggesting
32GB or 64GB. It is not also uncommon to see posts from users using 128 or 256GB
RAM. What is a reasonable amount of RAM for home computer above which it
would be better to use online computing services from companies?
Reply
new_dl_learner says
2017-09-03 at 16:46
I read that CPU is not as important as GPU for DL. Just to make sure the number of
CPU cores is 2x the number of GPU. However, I also read that CPU cores could be
assigned to take part of ML/DL computation. Do, does that mean it is good to have
as many cores as I could get?
Reply
More cores are always better, but it is also a question of how much you want to
pay. I think CPU cores = 2x GPU might be a bit much for the high range. If you
get 3 GPUs a 4 core is still sufficient. If you have 4 GPUs I a 6 core would also
be sufficient. I would however not recommend a 2 core for 3 GPUs. 4 cores for
a 4 GPU system is borderline, as it will be okay if you just run deep learning but
it might become a bottleneck as you run any other application in addition. So
choose according to your budget and according to your needs.
Reply
new_dl_learner says
2017-09-01 at 15:34
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 143/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Since I have my model all setup and running on Google Cloud platform as well
as on my own system, I have a very good comparison of speed and price. I will
recover the price of my rig in 5-10 months if I run with one GPU and 3-6
months if I run with two (and even sooner with 4 GPUs). This is based on
running my computations at least 12 hours a day every day (which reasonable
for my case). I did *not* factor in the fact that mine runs 1.5x to 3x faster than
GCP. I did factor in the cost of electricity though which is 40 cents / kWH for
tier-3 and 27 cents / kWH for tier-2 consumption in my area. The biggest
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 144/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
This is very good advice and a thorough analysis — thank you for giving
back! This is very valuable and I should incorporate this advice into my
blog post.
Reply
Hi Tim,
Glad you found my input useful. I found this entire forum more
informative than anything else out there on the internet from a deep-
learning DIY POV. Thanks again.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 145/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
tested running with two PSUs yet, but I think it should work (despite
what many people will say on the internet) as long as you choose the
second PSU such that it will be okay with one not connecting the 24-
pin and EPS/ATX12V mobo pins (some may not put any voltage on the
line or suffer poor voltage regulation in this scenario, but I hope that
my seasonic will do fine), one ensures that any given component is
entirely powered by the same PSU and all the PSUs are grounded to
the same ground. So for e.g. ensure that all the motherboard power
sockets (one 24-pin and two 8-pin EPS/ATX12v sockets on my mobo)
are powered by the same PSU. I’ll update this forum when/if I do that.
An additional complication is that PSUs normally don’t start powering
the output lines when you turn on their power switch. They do that
only after they receive a control-signal from the motherboard (which it
does when you hit the start button on the computer-case) which they
receive on two pins on the 20-24 pin ATX power connector. You can
fake that signal to the second PSU by shorting the correct two pins or
you can buy a device that will relay the mobo’s signal to the second
PSU. I haven’t tried this out yet, so don’t know for sure if it will work but
people have done this successfully so I’m hopeful that I’ll be able to
make it work.
Reply
new_dl_learner says
2017-09-08 at 21:01
About getting a dual CPUs motherboard and have each CPU controlling
two GPU at the same time… I have two questions:
1. If I only have one CPU installed, can the motherboard control 4 GPU at
the same time at 16x16x16x16?
2. In case two CPUs are required to control 4 GPUs at the same time at
16x16x16x16, will DL software such as Tensorflow take care of the parallelism
and distribution of workload of the GPUs automatically?
Reply
1. I didn’t find any mobos that will do that – but I didn’t seriously
consider anything that was north of $670. I suspect you could find this
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 146/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
new_dl_learner says
2017-09-09 at 00:44
Thanks.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 147/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2017-09-09 at 01:19
Latest (Purley), high-end Xeons support 48 lanes per CPU, hence
96 lanes would only be supported from a dual CPU config. These
CPUs feature up to 3 Ultra Path Interconnect links, for CPU-to-CPU
communication, at 9.6 and 10.4 GT/s.
Source:
https://software.intel.com/en-us/articles/intel-xeon-processor-
scalable-family-technical-overview
If this is not feasible, then its possible that dual CPU configurations
may be slower than a single CPU motherboard that utilises a PCIe
switch (PLX).
Michael says
2017-09-09 at 01:34
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 148/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
new_dl_learner says
2017-09-29 at 04:31
placement and Tensorflow will do the rest – i.e. it will deploy your
graph-code to the multiple GPUs and CPU, transfer data back and
forth between GPU and RAM and coordinate the execution of the
entire graph spread-up over CPUs, GPUs and RAM. It takes very
little effort compared to how much work it does. Coding a
asynchronous model on the other hand, will take a bit more
coding. Oh and one more thing – be sure to use queues and
queue-runners for reading data asynchronously from the disks so
that the data is ready in RAM when the graph needs it. You’ll also
need to ensure that your BIOS is setup properly – for e.g. I had to
turn on the ‘Above 4G Decoding’ option on my motherboard. I
also noticed that if I turn off ECC, then the speed actually slows
down contrary to what I had expected. I also notices a ‘warm-up’
period of about 30-45 minutes after boot-up when the graph runs
3x slower – not sure why (maybe the time it takes for OS to load
inodes into cache?) but now I just suspend the machine instead of
shutting it down.
James says
2017-11-09 at 21:04
Hi Tim,
After some digging I came accros this company that could help
me – Elysian ai. Do you know them?
http://www.elysian.ai
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 150/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
new_dl_learner says
2017-08-27 at 02:51
Reply
Leo says
2017-08-28 at 15:07
You never will. Threadripper have 64 PCIe lanes, but you have to left 4 of them
for chipset and most mobo now will feed 4, 8 or 12 lanes to MNVe/SSD and
other disks.
Reply
I checked NewEgg and indeed the X399 board’s specs show that they only
support standard PCIe setups. However, if you look at the manufacturer’s
homepage you will see that they indeed support full 64 PCIe lanes. I assume
Newegg system is not updated yet to make a 16x/16x/16x/16x system available
for the specs (it seems to be standardized). See for
example https://www.gigabyte.com/Motherboard/X399-AORUS-Gaming-7-rev-
10#kf
Reply
new_dl_learner says
2017-08-31 at 17:51
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 151/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
You are right, I just for searching “lanes” and confused the 64 that I saw
with specs for the motherboard. This is strange indeed, why do they
not support the full 64 lanes? There was another blog post saying that
this particular board would support that, but the manufacturer’s page
clearly says it does not. I would get in touch with any manufacturer and
just ask.
Reply
new_dl_learner says
2017-08-31 at 18:23
http://www.guru3d.com/articles-pages/amd-ryzen-threadripper-
1920x-review,4.html
I haven’t found a good explanation but I think it’s likely 4 lanes are
used to connect the cpu to the X399 chipset.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 152/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Hi Tim, I’m new to Deep Learning and Computer Vision and I need to build a
workstation for that within $1000 budget and I’ll considering used and low cost
components available in Pakistan. So far I have found following options.
GPUs
GTX 1050Ti – $185 – 2GB
GTX 1060 – $400
GTX 1070 – $512
GTX 1080 – $711
Other Options Include Quadro 5000 with 2.5 GB and 382 bit
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 153/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
RAM
16GB DDR4 $142
16 GB DDR3 $33
HDD
500 GB $19
1 TB $33
SSD
128 GB $28
Please guide me about the most powerful and lost cost combination that will help
me in future. Also let me know if a better combination of motherboard and
processor can be made from available parts.
Reply
You can save money by using the DDR3 ram option with a suitable
motherboard. The cheap E5 options look quite good to me. I would go for a
GTX 1070 given these prices. If you are short on money a GTX 1060 with 8GB of
RAM would also be okay. Hope that helps!
Reply
Fernando says
2017-08-12 at 20:06
Hi Tim,
I followed your guide to understand better my needs in the computer I want to
build for deep learning applications. However I have a question regarding the PCIes
from the CPU.
Specifically, you mention that 40 PCIe are good to go for 4 GPUs, and also
mentioned that every GPU communicates through 16 PCIe. In my mind if I would
like to use the full potential of the GPUs I calculate I would need 16×4 = 64 PCIe in
my cpu to make this communication efficient. I defintely misunderstood something
about that but I would love to know how did you came to this conclusion. So the
question basically is, how many PCIe does a CPU need? do I need more than the
ones that my GPUs demand? Is there any other component demanding this buses
and therefore necessary to have even more?
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 154/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Generally the more the better and while PCIe speed is not that important if you
only do parallelism among 4 GPUs it is still the easiest factor to improve (or
decrease if you do not have the lanes) your performance, Generally only
devices that are attached to the PCIe bus also draw lanes. For example, if you
have a PCIe SSD, this will also affect the transfer speed to your GPUs. The setup
in which your PCIe devices can run is specified by the motherboard. For
example you might have a 40 lane CPU, but your motherboard only supports a
8x/8x/8x/8x setup for your PCIe devices, in this case GPUs, so that so GPU can
utilize the full 16x speed.
The details are more complicated., but I hope this helps you to get an overview
about the issue.
Reply
Fernando,
Also check if a PCIe switch (PLX) makes sense for the types of workloads you
will be creating.
other (DMA) via the PLX chip, i.e without consuming any PCIe lanes on the CPU
and without being affected by any communication of the other pair of GPUs.
Also, note that PCIe uses separate lanes for downlink and separate for uplink,
i.e a device that supports 16 lanes, practically supports 16 lanes uplink and 16
lanes downlink, which can be used concurrently at full speed. This is beneficial,
when the software library uses the following approach:
If the workload can be split into four processing stages that take about the
same processing time and each stage can be handled by a separate GPU,
here’s how the data would be transferred at full speed: GPU1, GPU2 are
attached to PLX1 and GPU3, GPU4 are attached to PLX2. The CPU uses 16
(uplink) lanes to send data to GPU1 via PLX1. At the same time (in parallel),
GPU1 transfers the data it has just processed to GPU2 using 16 lanes via PLX1,
GPU2 transfers data it has processed to the CPU using 16 (downlink) lanes via
PLX1, the CPU transfers this data to GPU3 using 16 (uplink) lanes via PLX2 and,
similarly, GPU3 transfers data it has processed to GPU4 using 16 lanes via PLX2;
GPU4 transfers the data it has processed back to the CPU using 16 (downlink)
lanes via PLX2. You’ll notice that the GPUs make use of all 16 PCIe lanes
available to each of them and the CPU also makes full use of 32 lanes (on both
directions, up link and downlink).
In other words, your software can potentially make optimal use of 16-lane
GPUs, via a CPU with 32 available PCIe lanes, if it only needs to send data from
the CPU to the first GPU and receive data concurrently from the last GPU (in a
sequence of GPUs, where each GPU does some processing and forwards the
data to the next, for further processing) back to the CPU. The workload needs
to be balanced, so that GPUs don’t wait too long.
I’m aware of two EATX motherboards with this feature and they are quite
expensive and I’m not sure if the additional cost can be justified in terms of
performance:
Regards
Reply
new_dl_learner says
2017-08-14 at 13:32
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 156/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
I suggest that you ensure the motherboard has enough PCI-E x16 slots
for future expansion (as Tim has advised) and, if you are concerned
about the number of lanes that will be available in a multi-GPU setup,
you will need to download the manual (in pdf ) of the motherboard
you’re interested to buy and check the number of lanes according to
the number of GPUs.
Without a PLX chip, depending on CPU lanes, the manual could say for
example 2 GPUs at 16/16, 3 GPUs at 16/8/8, 4 GPUs at 8/8/8/8. A
motherboard with PLX typically says 2 GPUs at 16/16, 3 GPUs at
16/16/16, 4 GPUs at 16/16/16/16.
Reply
new_dl_learner says
2017-08-14 at 23:25
About using more than one GPU for DL, it seems that I need to
write software to take advantage of parallelism. Isn’t the use of
multiple GPU to solve problems automatic? I mean when more
than one GPU is installed, the hardware and software (e.g.
tensorflow) automatically detect the existence of multiple GPUs
and divide the task to all the installed GPU automatically.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 157/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
It advertises 4 x PCIe 3.0 x16 but then explains that the third GPU
will be operating in x8 mode. Check the detailed specs in the
manual before you buy a motherboard.
new_dl_learner says
2017-08-18 at 17:29
For the two motherboard, it can support 4 x PCIe 3.0 x16 (x16,
x16/x16, x16/x0/x16/x8, or x16/x8/x8/x8) at the same time since they
didn’t share any bandwidth with any of the slots in the
motherboard.”
Does that mean for these two motherboards, I can use four 1080Ti
GPUs running at top speed at the same time?
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 158/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
new_dl_learner says
2017-08-19 at 02:02
Hmm… What is “For the full speed, it actually depends on how you
will use the 4 ROG-STRIX-GT1080TI-11GB at the same time. “?
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 159/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
“The ROG Zenith Extreme again can work with 4 graphics card
since it support multi-GPU and supports 4 way SLI Technology. For
the full speed, it actually depends on how you will use the 4 ROG-
STRIX-GT1080TI-11GB at the same time. I can still recommend the
Zenith if you don’t want to overclock your GPU. Your GPU speed
will not lower down even if you connect an SSD or another
expansion card since there are no bandwidth between the PCIE
slots. For Intel i9 processor I can suggest the ROG Rampage VI
Extreme.”
new_dl_learner says
2017-08-12 at 15:52
Thanks Tim. How the number of GPU card scales with the performance? For
example if I have X*1080 Ti installed on the same computer, will it take 1/X the time
to complete the same task?
Reply
Scaling within one computer is usually quite good. It still depends on the task,
but you can expect a scaling from 2.5-3.9 for 4 GPUs depending on the
software framework. The main drawback is that you have to add more special
code with handles the parallelism. I recommend PyTorch for these kinds of
tasks.
Reply
new_dl_learner says
2017-08-06 at 03:04
Hi Tim, I have a PhD in Computer Science but I have not worked on DL before. For
CPU, do you recommend the AMD Threadripper, Xeon or Core i7-7700/7700K? I
plan to buy a 1080 Ti first and if needed, add more later.
Reply
Any of the CPUs that you listed is fine for deep learning with multiple GTX 1080
Tis. Choose the CPU according to your additional needs (preprocessing, other
data science applications, other uses for your computer etc).
Reply
new_dl_learner says
2017-08-10 at 20:05
Thanks Tim. As I know, software for my other needs do not take advantage
of multi-core. So, faster CPU is better than having more cores. Do software
related to deep learning take advantage of multi-core, multi-thread? If so,
about how many cores and threads of CPU would be advantageous? AMD
and Intel have different system/memory bandwidth. Which would be
better?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 161/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Most deep learning libraries make use of a single core or do not use
other cores in full. Thus CPU with many cores does not have a great
advantage over others.
Reply
new_dl_learner says
2017-08-11 at 14:37
Thanks Tim. Regard to the GTX 1080 Ti, there are several
companies selling cards of different variants using the GTX 1080 Ti,
which brand and variant do you recommend? I plan to buy one
card first and if needed, add more later.
I would recommend the cheapest card. The cards are almost the
same. Overclocked cards have almost no benefit for deep learning
(for gaming they do, though). I am not sure about the Founding
edition — have not heard anything bad about it other than other
cards being cheaper.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 162/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim,
thanks for this great guide!
It helped us to choose a deep learning server for tensorflow. We now use this rack
machine https://www.cadnetwork.de/de/produkte/deep-learning but with four Tesla
P100 instead of GTX 1080 Ti. But i don’t know if there is a huge difference between
GTX and Tesla.
I can confirm that 1-3 GPUs are used fully and the fourth GPU deliver about 40% of
their performance. It could be a limitation of the PCIe Bus.
Thanks
Thorsten
Reply
Hi Thorsten,
that is interesting. I do not think that the 40% performance comes from PCIe
issues alone, there might be another thing amiss. It cannot be some cooling
issue since then you would see a performance degradation with other GPUs
too. It would be interesting to know the reason for this. Let me know if you
know more!
I am happy that my guide helped you to choose your server! Indeed, Tesla
GPUs are only minimally better than GTX GPUs. The P100 is quite a bit better
than the GTX 1080 Ti, but it also costs un-proportionally more. I think GTX 1080
Ti would have been more cost effective, but often these are not available for
servers (NVIDIA has the policy to sell GTX cards only to consumers and Tesla
cards to companies), so overall not a bad choice!
Reply
Johydeep says
2017-07-14 at 20:31
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 163/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Michael /Tim
I am looking for one deep learning PC and I found this “Intel Core i7-7800X
Processor”
with
Socket LGA 2066
Compatibale with Intel® X299 Chipset
6 Cores/12 Threads
Max Number of PCI Express Lanes 28
Intel® Optane™ memory ready and support for Intel® Optane™ SSDs
AND
MSI Performance Gaming Intel X299 LGA 2066 DDR4 USB 3.1 SLI ATX Motherboard
(X299 GAMING PRO CARBON AC)
Thanks
Johydeep
Reply
It looks reasonable. With 28 lanes you will have a bit slower parallelism, but for
2 GPUs this bottleneck is not too large so you should still be fine; I guess you
could expect a performance decrease of 10-15% for parallelism with 2 GPUs,
which is okay. Otherwise, the specs are quite good for general computation, so
if you want to use your CPU for other data science tasks this is a good choice. If
you want to only do deep learning I might go for a slower CPU which has more
lanes, but your current option is also not too bad.
Reply
Tom says
2017-06-30 at 09:06
Hi Tim,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 164/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Can i use “GeForce GTX 1080 Max-Q” laptop for deep learning task?
Here is the full description. I need something really portable but same time i need
to be able to train RNN models.
https://www.amazon.com/HIDevolution-Zephyrus-GX501VI-XS74-HID3-Diamond-
Compound/dp/B0736C1PP5/ref=sr_1_7?ie=UTF8&qid=1498792741&sr=8-
7&keywords=gx501vi&th=1
Reply
The GPU in that laptop is quite powerful so you will be able to train RNNs
without any major problems. It also should be quite fast compared to, say, a
GTX 1060 which will be quite a bit slower.
Reply
Tom says
2017-07-07 at 21:16
Cuda core is fine but apart from that everything else is 30% less compare
to main GTX 1080.
Reply
You can expect the card to be about 30% slower, but that is still pretty
fast compared to other cards. You might need to adapt your models
slightly or use 16-bit precision for very large models, but you should be
able to run everything that is out there.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 165/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Mirela says
2017-06-25 at 17:09
Hi Tim,
Reply
Mirela says
2017-06-28 at 14:57
And upon long pondering, I assume the Xeon E5 1620 v4 is a wiser choice
compared to an i5/i7 setup.
Xeon is mentioned here, as well as is graph processing for a similar setup:
https://www.youtube.com/watch?v=875NbdL39A0&feature=youtu.be&t=243
+
https://www.youtube.com/watch?v=875NbdL39A0&feature=youtu.be&t=445
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 166/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I’ve already invested in a ‘good’ (what budget could hold GPU, namely the
gtx 1060 6gb, and ram would be 16 or 32 gb as well (already useful for R).
But now it seems the Xeon would be the best option.
That sounds reasonable. If I were you I would also pay good attention to
the motherboard. If it has extra RAM slots (8 slots) then you can always
increase the RAM size if you need more; in that way you can upgrade your
setup depending on the problem that you are working on.
Reply
You might want to go with the 64GB setup depending on what kind of graphs
you will work. The graph structure can differ greatly and some graphs will
require you to have more than 100GB of RAM while for others it is more
manageable. The CPU if often less important (but still depends on graph and
problem, so check this for problems/graphs you work with). A GTX 1060 might
be a bit slow at times, but often you do not work with the full graphs anyways
because training would take too long. Thus you could also trim down your
graph further and then a GTX 1060 is a solid choice (no large memory required
and good speedup over the CPU).
Reply
Mirela says
2017-07-14 at 14:09
Hi Tim,
Many thanks!
I have bought the components for below listed setup, aiming at having as
much RAM as possible (‘affordable’ :).
– intel xeon e5 1620 v4
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 167/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim,
many many thanks for your great blog articles, they are a great help!
I have a perhaps a bit off-topic question. Can you recommend any resources to
learn about computer hardware on a conceptual level? So I am not really interested
in the underlying electrical engineering just yet, but about different components
and how they interact. For example I’m interested in how data is being transferred
from memory to GPU-memory in more detail.
Reply
That is a good question, but unfortunately, I do not have a good answer for
that! I also wanted to learn more the conceptual side of hardware, but the
resources that I found are often resources from universities and textbooks
which also look at the details. What I found most promising was to just do
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 168/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
google searches for specific questions and try to get informed through multiple
sources of websites. For example googling “cpu to gpu memory transfer” will
yield blog posts, forum questions, presentations on the topic and so forth. With
that, you can get informed about that question. From here you might have new
questions which you can then google. If you do this for a few hours every
week, you will get quite knowledgeable about concepts quite quickly. Hope this
helps!
Tim
Reply
Reply
First of all, thank you very much for how comprehensive and deep knowledge you
gave us through the two blogs: the full and the GPU focused.
I wish that you could answer my question, with advance apologies if my question
asking the obvious.
I was about to spend around £3800 on a PC (the new ALIENWARE AURORA) which
has two GeForce GTX 1080 Ti, 64GB DDR4 at 2400MHz, and Intel Core i7-7700K
Processo. I was very happy that I finally could decide which PC I should buy for my
PhD research the next two years. What made me more happy that I was following
your appreciated GPU-focused blog – YES! I have multiple high performance GPUs!
However, something took me to the other blog – this blog – and I read the CPU
advice ending with the fact that my CPU is only 16 PCIe lanes – not 40 as you
warned. I went back the first step in my searching for a PC, before 4 months.
I did my best again, [focusing only on built PCs by Dell or Lenovo], and I ended
with another ALIENWARE PC – the ALIENWARE AREA-51, which has same* GPUs
and Memory in the first PC in my comment, however it has different CPU which is
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 169/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
i7-6850K with 40 PCIe lanes and 3.0 ER. However, the cost went up by more £700:
£4500. It is expensive, I could and will afford it for my PhD, but it is expensive.
When I reached such cost, I remembered two Laptops from which I ran away
because their costs. I said to myself, if I reached £4500 with the PC, why not go with
the life laptop with one or two more thousands. The laptops are:
If you could please, and really I am so sorry to have you and your appreciated time
reading this long comment, help me with selecting one choice or arranging them
with your reasons, you will make my next two years technically truly safe. I have to
say that my research is on two different data spaces: genetic data and textual data.
Finally, thank you again for your contribution through this blog, and thank you in
advance for getting this point reading my comment.
*To be fair regarding the cost of the second PC, it has 2 more TB HDD [4TB] than
the first PC, however it provides the same size of SSD: 512GB.
Reply
These are all solid options albeit all quite expensive. Note that PCIe lanes are
not that important if you have 2 GPUs, but become more important if you have
4 GPUs. However, I do think the biggest issues here is just that these computers
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 170/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
are too expensive. If I were you I would go for a used computer solution which
I would upgrade to your needs.
For example, I just last week sold my used computer, which is similar, or even
better than these options for 800 pounds on gumtree. So a smart choice might
be to buy a used computer and upgrade it with some parts. For genetics
research I would try to find a cheap computer which support 8 RAM slots and
than buy 64 GB RAM for the machine and upgrade to 128 GB of RAM if your
research requires this. Speed of the RAM is overvalued; a plain DDR3 RAM
setup is sufficient and cheap. For some deep learning algorithms or algorithms
in computational biology a single GPU should be sufficient but choose one that
has a lot of RAM; a 12GB is ideal and I would go for a used GTX Titan X for
400-500 pounds on eBay (make sure your computer has a PSU which at least
support 600 watts).
This option would yield a very high performance computer for roughly 2000
pounds. Of course it requires some manual assembly, but it really is not difficult
and you really should try to do this.
If you cannot get a used option with parts due to university bureaucracy I
would go with a ordinary laptop + a hetzner.de GPU machine which for a 3
year PhD will cost 4400 pounds but offers everything that you need and can be
canceled / upgraded month-wise. For most genetics research you should be
fine without a GPU which would cost 2150 pounds on hetzner.de. If your
algorithms require double precision then you will need to make a careful choice
about which GPU to get, but probably the most cost efficient solution would
involve renting some Tesla GPU in the cloud (AWS for example) to work with
double precision when you need it.
So the main options that I see are (1) buying used computer and upgrade its
parts, (2) buy ordinary laptop and a dedicated machine in the cloud. These
options will give you the best performance per quid.
Reply
Don’t know how to thank you. Your generosity representing in your reading
time and response is profoundly appreciated.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 171/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Raja says
2017-06-02 at 10:17
Reply
It depends on the algorithm but in general PCIe lanes with 2 GPUs are not that
important. If will decrease performance but not by a lot. Maybe 0-10%
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 172/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Tom says
2017-05-25 at 02:59
What is the best way to put 4 GPUs (NON-founder edition) easily in a board?
Thanks
Reply
Michael says
2017-05-25 at 01:12
i7-7700k costs over $300. That’s not cheap. For that kind of money you can get a
CPU with 40 lanes (e.g. E5-1620v4), and put it into something like ASRock X99
Extreme4 board. Or you could pay more for Asus X99-WS board which has 2 PLX
switches and supports quad PCIe x16.
Reply
Tom says
2017-05-25 at 01:16
Thanks, Michael.
Reply
Sam says
2017-05-25 at 01:54
That’s true, although I’d like to have something newer/faster than ivy bridge as I
do use this machine for more than just deep learning. If I really wanted to save
money I could use the supercarrier board with a ~$40 kaby lake G3930.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 173/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Michael says
2017-05-25 at 02:15
Reply
Sam says
2017-05-25 at 02:32
Reply
With a x16 lanes CPU you’re limited to either x16 lanes to one GPU
at a time, or x8 lanes for concurrent exchange of data with both
GPUs.
As Tim mentioned earlier, for 2 GPUs you’re fine with x8 per GPU.
Having said that, 16 lanes on the cpu may not be sufficient, as
some of these lanes maybe reserved by the chipset or other PCIe
devices, eg integrated M.2 slot.
I would opt for a cpu with more PCIe lanes, as Tim and Michael
have advised.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 174/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Sam says
2017-05-13 at 18:52
Hi Tim,
Thanks so much for writing all of this up, it’s very informative. I’m currently picking
out parts for a DL machine, and I’m trying to figure out where I may have
bottlenecks.
Your piece on DMI for ram to vram transfer is quite interesting. Most of what I’m
reading emphasizes high pci-e bandwidth. I’m building a dual gpu system, and I’m
wondering if I really need both gpus running at pci 3.0×16, or if x8 is fine for each?
It sounds like the DMA bandwidth could be a problem. I couldn’t find much info on
DMA related to specific chipsets, however you mentioned 12GB/s. Is this bandwidth
the same for different chipsets (I’m comparing z270 to x99). If I’m mostly running
independent models on each GPU, would I see much if any benefit to 2x pci-3 x16,
or would that only really show a big benefit when running the gpu’s in parallel for a
single model? Asynchronous mini-batch allocation is interesting, however I’m not
sure if it’s integrated into all of the newer high-level DL frameworks…
Re: the DMA issue, intel’s new optane drives are routed through PCI, and they can
be used as ram in addition to long term storage. Do you think that these can be
used as a way around the DMA bottleneck??
Reply
If you use the right algorithms there will be almost no decrease in performance
if you use x8 for each GPU. Even if you use the “wrong” algorithms,
performance reduction should be minimal for most models since aggregated
transfer-times for 2 GPUs are not that large. The costs increase dramatically as
you add more GPUs though — for a 4 GPU system it is important that you are
on PCIe 3.0 with at least 32 PCIe lanes from your CPU/motherboard.
I would not care too much about DMA. I suppose for most chipsets / CPU
combos it is the same. It might differ a bit here and there, but the performance
difference should be negligible. I recommend using PyTorch for parallelism if
you have a two GPUs and if you have 4+ GPUs I recommend Microsoft’s CNTK.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 175/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Sam says
2017-05-24 at 23:19
I thought I’d mention, theres a motherboard that I feel is perfect for dual
gpu rigs: The ASRock Z270 supercarrier:
http://www.asrock.com/MB/Intel/Z270%20SuperCarrier/index.asp
This board, like some of the x99 workstation boards has a PLX switch,
allowing dual pci x16 or quad pci x8 on a Z270 board. For dual gpu rigs,
you get the added benefit of being able to run 2 gpus 4 slots apart
(instead of the usual 3 slots on most non-workstation boards). This helps a
lot with cooling since there’s more space between the 2 gpus, especially
with non-reference coolers taking up 2.5-3 slots these days.
Reply
Tom says
2017-05-24 at 23:50
Hi Sam,
What Processor are you using with ASRock Z270 supercarrier?
Intel i7-6850K Processor ???
Does this motherboard support 40 Lane ?
Thanks
Tom
Reply
Sam says
2017-05-25 at 00:42
Hey Tom,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 176/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Michael says
2017-05-24 at 23:52
Dual PCIe x16 should not need any PLX switches. The switch is only
needed when you want to do Quad PCIe x16, which is more lanes than
a single CPU can support.
Reply
Sam says
2017-05-25 at 00:43
Dual PCIe x16 doesn’t need switches when using a 40 lane CPU,
however having a switch on a Z270 board allows me to use a
much cheaper, still very powerful 16 lane CPU with 2 GPUs at x16.
Tom says
2017-05-25 at 00:49
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 177/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Tom says
2017-05-25 at 01:14
Hi sam,
Can we use Z270 board + Intel Boxed Core i7-6850K Processor
together ?
Reply
Tom says
2017-05-25 at 01:17
Thanks, Sam.
Reply
Martin says
2017-05-10 at 13:32
Hi!
First of all a big thanks! You’ve basically created the best resource for deep learning
enthusiasts looking to build their own machine.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 178/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
For the CPU, I was first looking at Intel i7 6850k, which was the cheapest i7 I could
find that supports 40 lanes. However, Intel Xeon E5-1620 V4 is almost half the price
and also supports 40 lanes. Not sure if the faster i7 is worth the money here?
Lastly, I was thinking about getting a water cooler for the CPU. I’ve read mixed
opinions about water cooling, but I reckon moving air outside of the case should be
a good thing as it allows the GPUs to run at lower temperatures?
Thomas says
2017-05-06 at 15:50
Hello Tim.
Thank you very much for this blog. You gave me solid background for my
understanding of dependency between hardware and deep learning.
I have a question about bus speed in CPU. Should that be a concern ? As you
wrote the true bottleneck is between cpu and gpu, and as i understand the “Bus
speed” which is at ark.intel white sheets refers to that connection.
I have to choose between E5-262x v4/3 or E5-16xx v4/3. The 262x family have bus
speed set at 8 GT/s QPI while 16xx have 5 GT/s. (8 GT/s is what PCIe 3.0 offers)
Besides the Scalability, clock frequency and memory bandwidth that is the
difference between them, and the only one that matters in deep learning since all
of them have clock speeds above 2GHz, memory bandwidth over 68 GB/s and I will
not make use of Scalability.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 179/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
It is correct that this is the main bottleneck between the CPU and GPU,
however a very tiny amount of time in spend between CPU-GPU interactions
on a memory level compared to the actually GPU computation. It becomes
more relevant if you have multiple GPUs, but for multiple GPUs the main
bottlenecks are somewhere else. Currently a good CPU (in terms of bus speed)
will improve your deep learning performance by about 0-1.5% compared to a
“standard” one and I would not worry about it too much. I think all the CPUs
that you linked are more than fine.
Reply
Adarsh says
2017-05-05 at 13:57
Reply
The speed of the computational units between different GPUs of the same
series are about the same (NVIDIA Titan Xp modules are not much faster than
say, GTX 1060 modules), but the reason why bigger cards is faster that they just
have more modules (called stream multiprocessors or SMs). If your model is
computationally not intensive, then benchmarking some small GPUs and
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 180/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
extrapolating the number of SMs might be a valid option to find which is the
optimal GPU in this case.
For operations which saturate the GPU such as big matrix multiplications or, in
general, convolution this is very difficult to estimate. It sounds like you want to
reduce costs. A good way to do this is also through power efficiency and this is
a very transparent option which can be easily optimized. It also sounds like you
want to reduce latency — this is very difficult to test because computationally
graphs differ too widely; the only option that I see is to find people that have
these GPUs and let them run benchmarks on your model. Or otherwise, try to
generalize existing benchmarks for your model.
Reply
Adarsh says
2017-05-05 at 14:24
Thanks
Reply
Michael says
2017-05-05 at 18:46
Adarsh, it’s hard to give you any advice, because you didn’t tell us
anything about what you’re trying to do exactly: what is your accuracy
target (e.g. on ImageNet)? What is your power budget?
People have run VGG and Inception on an Iphone 6s, with 150-300ms
latency:
http://machinethink.net/blog/convolutional-neural-networks-on-the-
iphone-with-vggnet/
See this nice paper which tested the previous Jetson kit:
https://arxiv.org/abs/1605.07678
It also provides some insight into the relation of amount computation
and accuracy.
Reply
Charles U says
2017-05-01 at 06:52
Hi Tim,
Thanks for this great article, I also read your other one on GPU performance. I’m on
a budget right now so planning on buying a GTX 1060 6G, with the intent on
upgrading in the future.
In this post, you mention your computer should have at least equal RAM as your
GPU. Does that mean it would make more sense for me to buy a 6G RAM
computer to match my card size ? I was originally planning to get a 4G RAM
computer. And in the future, if I get a 1080Ti with 12G, will I have to upgrade my
computer to 12G RAM ?
Thanks
Charles
Reply
This requirement is not so strict; I should update my blog post on this. If you
have 4GB RAM you will be able to work with most datasets if you stream your
data, that is load them in small batches bit-by-bit. If you do this 4GB will even
suffice for the GTX 1080Ti. You might run into some problems if you run very
large RNNs, but this can be prevent with some code which initialize weight
directly on the GPU rather than CPU. You might also run into problems when
you preprocess data, but also this can be managed with some extra code. You
should be fine with 4GB.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 182/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Hi Tim,
I’m reading through each and every single article of yours, they are to-the-point
and very helpful for beginners in neural networks, like myself.
Regarding PCIe 3.0, I’ve noticed that most consumer-grade motherboards fall back
to x8x8x16 for three GPUs and x8x8x8x8 for four GPUs. Hence, in a multi-GPU
setup, where a different CPU thread handles each GPU, it’s not possible make use of
the GPUs’ x16 lanes capability. Notably, PCIe 3.0 x8 has the same theoretical
throughput as PCIe 2.0 x16.
While searching for motherboards with more PCIe lanes, I noticed that some new
consumer-targetted motherboards come with a PEX 8747 Broadcom PCIe Bridge.
That’s a 48 lanes bridge, which is still insufficient for non-synchronised, concurrent
data transfers. Broadcom’s top of the line bridge supports 96 lanes (no idea how
this solution costs): 64 lanes could be used for 4 GPUs and 32 additional lanes for
the CPU, which means GPUs can communicate with each other using the full PCIe
3.0 x16 bandwidth and up to two GPUs can concurrently transfer data from/to the
CPU at full bandwidth.
Have you considered these solutions? Are you aware of motherboards that deliver
sufficiently good value for money, e.g. achieving performance that would costs a
less than alternative solutions, to justify the cost?
Thanks in advance.
Reply
Hi Nikos,
thanks for your comment. I also stumbled upon these switches, but in the end
they are probably not so suitable for deep learning. The details are a bit difficult
to understand but let me try to explain: The problem with these solutions is
that they still use the underlying PCIe interface and thus are limited just like
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 183/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
normal PCIe transfers. In most graphics applications you do not have parallel
GPU-to-GPU transfers, but GPU-to-GPU transfers which are slightly off-set in
time and also small in size. Under such circumstances you can have clever
protocols and extra lanes which feed into the usually attached lanes (which
have a hard limit to 16 per GPU) in a safe in secure manner without blocking
the channels. In other words, with these switches you can send multiple packets
asynchronously and securely but each GPU still receives one packet at a time; in
a normal switch each packet must be scheduled after all other packets on that
path have completed or otherwise one has insecure transfers (which can
corrupt the data).
The reasoning behind these switches is that they trade the synchronization of
the full PCIe path with the synchronization of sub-paths on the PCIe circuit (the
sub-path to the GPU) which increases performance for many applications,
especially graphics applications, but probably not for deep learning.
Hi Tim,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 184/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Kind regards,
Nikos
Reply
Reply
Reply
Hi Tim,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 185/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
The downstream ports are non blocking, i.e. when using a PCIe switch
of 80 lanes (64 lanes for the GPUs + 16 lanes for the upstream
connection to the CPU), pairs of GPUs can talk to each other directly,
using 16 lanes per pair.
If we use 4 GPUs, a single GPU can broadcast data to the others at full
x16 speed (versus x8 speed if they were attached directly to the CPU’s
PCIe lanes).
Also, the CPU can broadcast data to all four GPUs using the full x16
throughput.
Two pairs of GPUs can exchange data at full x16 speed (again, versus
x8 speed if they were attached directly to the CPU’s PCIe lanes).
All of this also depends on the type of algorithm that one uses
though, but it is good to know that these motherboards can
improve performance! Thanks again!
MacMinus says
2017-04-28 at 11:11
Since we are now more than 2 years down the line, and Moore’s law has been
doing its thing, I would be curious about an update to this great piece with the
current HW (e.g. multi-GTX 1080 Ti’s).
Reply
The general hardware recommendations did not change very much and I think
I would make the same recommendations that are listed here. If you are
interesting in GPU recommendations you can read my other blog post about
GPUs.
Reply
Petra-Kathi says
2017-04-19 at 10:01
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 187/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Another great thank-you from my side as I (hope I) have gained a lot of insight into
the hardware setup of deep learning workstations!
Reply
The P6000 is based on the GP102 ship, which is very similar to a GTX 1080 Ti
and Titan X Pascal. The features and performance will be similar to these cards,
that is, you will have usual support of all deep learning libraries, good
computational power, but almost no half-precision performance. So with that
card you will receive a powerful GPU which you can use in your certified
environment. If the cost difference between the P6000 and P100 is slim, you
might want to opt for the P100 with which you gain a bit of performance and
half-precision computation. However, if the difference is larger then just go with
the P6000.
Reply
Michael says
2017-04-21 at 17:59
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 188/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
32GB).
Talk to Exxact folks, I had a very positive experience with them.
Reply
Petra-Kathi says
2017-04-25 at 10:13
Reply
samihaq says
2017-04-11 at 08:12
Can anyone please look at my almost final rig, and suggest any improvements or
inform about any blunder which i am about to make, please, and especially about
any useless money spend which i am already very short of having. The only aim is
to have a solid reliable rig that can serve 24/7 for long time for around 2500-
2600$. Thank you very much. Regards.
https://pcpartpicker.com/list/BqRgHN
Reply
Petra-Kathi says
2017-04-19 at 10:11
Maybe you should consider one or two additional case fans? IIRC the power
supply fan pushes the air out. If you add another out-blowing fan somewhere
at the top and an aspirating one at the bottom this might improve heat
dissipation in 24/7 operation.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 189/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
trulia says
2017-04-29 at 08:52
Reply
A CPU can have between 16 and 40 lanes. Read the specifications of a CPU
to see how many lanes that CPU has. Usually you will need at least 8 lanes
for a single GPU, but this is dependent on your motherboard. The CPU can
provide support for lanes, but they must be there on the motherboard. A
CPU can support a maximum of 4 GPUs.
Reply
Reply
Michael says
2017-04-09 at 22:25
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 190/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
@Michael. Oh really. I am tired at searching for reviews and looking for things.
Can you please look at my built and suggest any improvements and especially
some mbo for max 300$ or give me some direction. Thank you.
Reply
Michael says
2017-04-10 at 17:47
Sami, for my last 3 workstations, I didn’t bother building them myself. I sent
the desired specs to several system builders, and then negotiated the price
down. In the end, I only paid a few hundred bucks more than what it would
cost me to do it myself.
For your budget, I would buy a used computer, 2-3 generations old, and
get a couple of 1080 Ti cards.
Reply
samihaq says
2017-04-11 at 08:07
Reply
samihaq says
2017-04-11 at 08:19
Thank you for info. EVGA informed through email that the
motherboard has been tested for Intel Xeon E5-1620 V3 and not the
v4.0. So thanks to you for informing me abt it and i have changed the
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 191/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
cpu from v4 to v3, both are almost same. Here is the reply from
EVGA:-
“Hello,
Thank you for the email so, unfortunately, EVGA hasn’t tested the
newer Xeon CPU like the V4. The only tested is the V3 these are only
we have tested that has supported the X99 motherboard with the
latest bios update. I apologize for the inconvenience.
Regards,
EVGA”
Reply
Hi, i am into the deep learning and currently have k5100 Quadro gpu with 8gb of
memory in a laptop with compute of 3.0. I want to make a solid DL rig which can
serve me good for atleast 4-5 years with heavy work load. After reading the
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 192/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Can anyone please look into my built and suggest any improvements. Also one
thing i am confused about is whether i should go for
Asus X99-DELUXE II ATX LGA2011-3 Motherboard 394$
or
Asus X99-A/USB 3.1 ATX LGA2011-3 Motherboard $228.88
Does going for deluxe Mbo with almost 180$ more is justified?
Another thing i am confused about whether founder’s edition of the Gpus by EVGA
or any other vendor is good enough or shud i go with customized with more fans
but ofcourse will cost more.
My built is
Intel Xeon E5-1620 V4 3.5GHz Quad-Core Processor (40 lanes) 286.99
Cooler Master Hyper 212 EVO 82.9 CFM Sleeve Bearing CPU Cooler $24.88
Asus X99-A/USB 3.1 ATX LGA2011-3 Motherboard $228.88
Crucial Ballistix Sport LT 32GB (2 x 16GB) DDR4-2400 Memory $219.99
Western Digital BLACK SERIES 2TB 3.5″ 7200RPM Internal Hard $122.88
EVGA GeForce GTX 1070 8GB SC GAMING ACX 3.0 $374.00
EVGA GeForce GTX 1080 Ti 11GB F ounder Edition $700.00
Corsair Air 540 ATX Mid Tower Case $119.98
EVGA SuperNOVA G2 1300W 80+ Gold Certified Fully-Modular ATX Power Supply
$182.03
Asus DRW-24B1ST/BLK/B/AS DVD/CD Writer
Total $2278
Thanks
Reply
Hi Sami,
I so do see why the $180 would be justified; the board adds 1 PCIe slot but give
pretty much the same deep learning performance.
Please note that if you have two GPUs of different chipset for example a GTX
1070 and a GTX 1080 you will not be able to parallelize them.
Often the coolers on the GPUs are quite similar in performance so that it
should not be a big deal. However, I am not so familiar with the current fan
designs and there might be a fan which is superior to others. I probably would
pay $20-30 if the fan performance is > 33% better, but not more. I do not think
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 193/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
it is worth it at a certain point – better to save that money to buy another GPU
in the future.
Thanks for the useful info. I did’nt know abt the parallelization of gpu issue
with different chipset. As these gpu are not cheap to get so i was thinking
that i will use a 1080ti for big networks, while using one or two 1070s for
small prototypes for params or checking different options in parallel on
relatively small scale.Do you believe in my logic or should i prefer
parallelization(by having same gpu’s, in which case i can afford max two
1080ti) over my current view??
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 194/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Thanks
It is truly a nice and helpful piece of information. I’m satisfied that you just shared
this useful information with us.
Please stay us informed like this. Thanks for sharing.
Reply
Thank you, I am happy that you found the blog post useful
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 195/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Umair says
2017-03-30 at 07:50
Hey Tim
Some of the links here direct to an old WordPress blog. Is that content unavailable
now?
Reply
All of my content has been moved to this blog so you should find it here. I was
not aware that there were some old dead links in this blog post. Thank you of
making me aware of that. I will clean that up in the next days.
Reply
Nader says
2017-03-19 at 18:39
Please help
Do you recommend getting an Alienware amplifier with an Alienware laptop with a
GTX 1060 for portability and the amplifier with a GTX 1080ti for the amplifier and a
station
Please help
Reply
Chris says
2017-03-14 at 02:16
Hi,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 196/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
someone else above had a similar but not exact the same question, hence I would
like to ask for your opinion as well
I understand that it would be optimal to have a CPU with enough native PCIe lanes
to connect to every GPU with 16 lanes. Given that I would like to build a system with
not more than two GPUs, I would need 32 lanes for the GPUs to avoid PCIe
bottlenecks. Currently that yields to socket 2011-3 CPUs (Broadwell) with 40 lanes.
If I would, for reasons of cost, use a socket 1151 (Kaby Lake) setup with a 16 lane
CPU but with a mainboard offering a PLX switch that can offer 2 PCIe x 16 slots, one
questions arises: Do the GPUs need the whole PCIe bandwidth permanently, forcing
the PLX switch to permanently share the 16 x bandwidth into 8 x / 8x or is it more
likely that the GPUs transmit in an interleaving manner with full x 16 bandwidth
available to the currently transmitting one. My guess is, that the truth would be
something in between but I have no exact numbers or benchmarks. Do you have
some experience regarding actual bandwidth loss or suggestions here, is it
beneficial to use a PLX switch in 16 lane CPU, dual GPU configurations or should I
definitely go for a 40 lane CPU?
Cheers,
Chris
Reply
Hi Chris,
there are some motherboard which support the 16x speed when no transfers to
the other GPU is executed, but this is rare. In general you will have 8x / 8x
speed. Check your motherboard specs for this.
I would not worry too much about PCIe lanes. If you want to parallelize GPUs it
will be a performance hit, but you would still get good speedups. If you use
good parallelization algorithms, like those provided by Microsoft’s CNTK, then
you will have no performance hit. If you use the GPUs separately you will see
almost no performance hit. So I would just go ahead with that setup. It will
probably give you the best bang for the buck.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 197/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Dhaval says
2017-03-12 at 18:44
Should I take Zotac Nvidia GT730 cause I don’t have much money and can spend a
max of 5000 INR. Any suggestions sir?-
Reply
The GT 730 variant with GDDR5 memory is a good choice in that price range.
The DDR3 variant will be much slower so pay attention to the memory. The
memory is just 1 GB in this variant, but if you use 16-bit networks you can do
some experiments with this. If you need to train larger networks then the DDR3
variant with larger memory (up to 4GB) will be a good choice too. You will have
to wait for experiments a bit longer, but you will be able to run most models if
you use 16-bit and you will get a speedup over using the CPU.
Reply
James says
2017-03-09 at 12:28
Yes it looks like Titan X and new GTX 1080Ti have basically the same specs, but
almost half price for 1080:
https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_10_
series
I’d nearly ordered a titan x only to find them now out of stock in most retailers.
Is there something fundamentally different about the 1080 vs Titan where deep
learning is concerned? Otherwise it looks like you could build a devbox clone for a
decent price.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 198/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Definitely go for the GTX 1080 Ti. The 1 GB memory difference is not significant
for most use-cases.
Reply
ElXDi says
2017-03-06 at 22:40
2. GTX 1060 3GB non reference design (the PCB is based on GTX 1080 with better
power feed and 8pin connector) 250e.
+ performance boost +5%
+ ability to over clock with volt mode
– price
– just 3GB of VRAM
3. GTX 1060 6GB ref design 260e
+ more CUDA cores
+ more VRAM
4. GTX 1060 6GB non reference design (the PCB is based on GTX 1080 with better
power feed and 8pin connector. 280e – 300e
+ more CUDA cores
+ more VRAM
+ really good over clocking ability (+15%)
– quite expensive
– price / performance index is not so good any more
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 199/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
So what you think about above options? What is more important more VRAM or
CUDA core number or GPU clock speed or VRAM bandwidth?
If you have missed it you might want to check out my other blog post about
GPU selection: GPU advice. To reiterate the points:
– Bandwidth is the thing that you want to have the most of
– The best GPU in terms of cost/performance is the GTX 1070 (and soon also
the GTX 1080 Ti)
– GPU memory size is important; but for many tasks 8GB is fine. If you want to
computer vision research get a 12GB GPU
To answer other questions: CUDA core number and clock speed are not that
important. Overclocking will give you almost no performance increase for deep
learning.
Reply
ElXDi says
2017-03-07 at 00:47
Thank you very much for your answer. Your answer really helps me.
As far as I understood the 3GB model is really useless. So the 1060 6GB is
fine for beginning and 1070 8GB is minimum for any real project and Titan
X 12 GB is required for something real.
Cheers!
Reply
om says
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 200/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2017-03-07 at 01:27
Titan X and GTX1080 ti have only 1GB difference in memory but in
price big difference.
http://www.eurogamer.net/articles/digitalfoundry-2017-gtx-1080-ti-
finally-revealed
Reply
Ashley says
2017-03-06 at 22:28
Reply
Ashley says
2017-03-07 at 00:04
Reply
Michael says
2017-03-07 at 01:47
@Ashley: I’d probably just get this one (after getting the price down to
$300, or $350, tops):
https://annarbor.craigslist.org/sys/6031436427.html
The advantage is it’s already got 1050 card in it, so you can start doing DL
right away. Later, if you realize you need more power, you can buy 1080 Ti,
and will still be within your $1k budget.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 201/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Ashley says
2017-03-07 at 18:28
Reply
Ashley says
2017-03-05 at 02:54
Hi,
Complete noobie build here.. So all aspects of all things computer needed.. I have
been using a laptop till now and I would like to build a reasonably priced PC that
can run CNNs. If needed I can tunnel in from anywhere to work with etc.
This is what I have.. would really appreciate any comments – have I missed
anything?
Thank you!
Reply
2017-03-06 at 11:21
Looks a solid build which offers some opportunities for upgrades in the future.
If I would do more data science I would probably go with cheap or used DDR3
CPU/RAM combo and buy more RAM (32-64GB); possibly I would swap the
GTX 1060 for a GTX 1070 if I have the spare money left from switching from
DDR4 to DDR3. If I would do more deep learning I would also go for a DDR3
CPU/RAM combo, possibly buy used hardware, and then buy a GTX 1080 Ti.
This does not mean that your build is bad. Your build is more future proof. My
build would be more “I-want-to-do-things-now”. I guess this depends on taste,
but be aware of what you want to buy when you buy hardware. Do you want
to buy data science, deep learning, machine learning, Kaggle competitions, or
being future proof? Your build buys all of that a little and a lot of being future
proof, which can be a very sensible choice.
Reply
Ashley says
2017-03-06 at 15:51
Hi,
You are awesome – thank you for the quick reply (because I have to get
the laptop I am working with back asap)
– I want it for deep learning & machine learning primarily, either at the
workstation or through a laptop that I can tunnel in with when needing a
change of environment.
– I need it to last because I may not have another chance to buy anytime
soon.
– In case this matters? I will be using Linux, probably Ubuntu flavour. It was
challenging installing on ROG – had to use rpm for some reason.
Reply
Ashley says
2017-03-06 at 16:07
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 203/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Ashley says
2017-03-06 at 21:27
Michael says
2017-03-06 at 22:10
Ashley, no, this is not how I’d spend a thousand bucks if I needed a
cheap machine for DL. Instead of getting all these parts
individually, I’d shop for a decent used desktop, then buy a good
video card separately. For example, something like
this: https://santabarbara.craigslist.org/sys/5992606383.html
Then you will have enough money left for GTX 1080 and more.
The truth is, CPU performance haven’t improved that much in the
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 204/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
last 5 years, so for deep learning an old CPU + 1080 will be faster
than a new CPU + 1070.
Also, you should get a SSD. Again, old CPU + SSD will be faster
than new CPU + hard drive.
p.s. and you definitely don’t need a liquid cooler (nor any
overclocking).
Nader says
2017-03-03 at 18:46
Reply
Andrew says
2017-03-05 at 09:04
Ryzen will work perfectly fine with a 1080 TI. However depending on your work
load Ryzen may not be the best option.
Pro: Ryzen has ECC RAM support which is great for mission critical situations
where data CAN NOT risk being corrupted at any cost. However if you are
mainly doing Deep Learning then ECC RAM is not really necessary at all as
most Deep Learning algorithm’s and AI training etc.. can be done on 16 bit or
even 8 bit precision (which is something that the TITAN X pascal excels at
actually, thus why something like a Quadro or Tesla isn’t necessary either in
most cases)
Pro: Ryzen has 8 cores, which are beneficial if you plan to do work with highly
multi-threaded programs for video editing, 3d rendering etc.. although in many
of these cases you are better off using GPU acceleration instead of relying on a
CPU since CUDA acceleration on an Nvidia GPU (especially a TITAN X) will be
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 205/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
FAR faster than ANY CPU. And again, if you are just doing Deep Learning
mostly, or maybe some PC Gaming on the side etc.. and aren’t doing programs
that need all those extra cores (Deep Learning only needs 4 cores even for four
way SLI in most cases as shown in this article) then the extra cores of Ryzen are
redundant frankly.
Con: Ryzen is limited to dual channel RAM. This cuts your memory bandwidth
in half pretty much which CAN effect intensive Deep Learning work somewhat.
It also only supports up to 2666-2900mhz RAM speeds in many cases which
isn’t really a big deal for Deep Learning but will effect any memory intensive
workstation/proffesional tasks. It also has a RAM capacity limit of 64GB
compared to Intel X99 chipset used with CPUs like the i7 6800K etc.. that allows
for 128GB of QUAD channel RAM clocked at up to 3600mhz. It’s up to your
situation whether you consider that a problem or not.
Con: Ryzen has no overclocking capability to speak of. Nobody has really been
able to get ANY Ryzen chip to get over 4.1ghz; with many even being stuck at
3.9gh zor 4.0ghz (which in the case of the 1800X is literally NO overclock at all
since the 1800X runs at 4ghz out of the box). So if you are using programs that
need clock speed then a faster chip would be beneficial.
So overall unless you really have specific need of an 8 core chip, i would say for
a Deep Learning PC, even if you do things like normal web browsing, heavy PC
gaming, video streaming/encoding etc.. you might be better off getting
something like an i7 6800K (which has 6 cores 12 threads but can hit 4.4ghz in
some cases so overall is a bit better) which is $100 cheaper than the R7 1800X;
or perhaps the i7 7700K which is only $329 ($170 cheaper than R7 1800X) and
can easily overclock to 5ghz with proper cooling (many people have hit 5.2ghz
even with just the high quality noctua air coolers or AIO water coolers etc..)
Only reason i would specifically get Ryzen is if you are really needing an 8 core
chip for specific programs, as Deep Learning and most general use doesn’t
require any more than 4 cores.
Reply
Michael says
2017-03-05 at 20:10
Keep in mind that 6800K has only 28 PCIe lanes (Ryzen and 7700k are even
worse), so if you’re planning to use multiple GPUs (now or in the future), go
with E5-1650 v4 (or E5-1620 v4 if you’re on a budget). Also, Skylake Xeons
are about to be released (this month), so if you can, wait for them (mainly
for AVX512 support).
Reply
tom says
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 206/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2017-03-05 at 23:08
Hi Michael,
I would like to use 4 GTX 1080 Ti
Reply
Andrew says
2017-03-06 at 04:22
Have you really noticed a difference between running a GPU with PCI-
e 3.0 x8 and x16 for Deep Learning though? In most other situations
i’ve seen having x8 PCI-e 3.0 isn’t hindering much at all, if any; you
sometimes see a 0.5% or maybe 1% performance delta between the
two but that’s typically it.
Reply
Michael says
2017-03-06 at 06:02
I haven’t seen this tested anywhere, but I’m guessing it’s important
for large networks running on fast GPUs, when it takes longer to
move gradients from GPU to GPU than to calculate them.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 207/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Yes, the AMD Ryzen CPU series will be compatible with your NVIDIA cards. In
general, all modern CPUs should support NVIDIA cards. This is so because the
CPU and NVIDIA communicate with a protocol that is in general used for
printers, network interfaces and so forth, and there is no CPU manufacturer
which can themselves not to support these features. Thus all CPUs should have
support for NVIDIA GPUs (at least those which come as PCIe cards, which are
all GPUs except the ones with NVLink, that is the NVIDIA P100 currently).
Reply
s12 says
2017-03-01 at 10:46
Hi Tim,
I have been looking into using NVLink to couple two TXPs. I was hoping to do this
in a SLI-like fashion (like shown here: http://www.kitguru.net/components/graphic-
cards/anton-shilov/nvidia-pascal-architectures-nvlink-to-enable-8-way-multi-gpu-
capability/ ), rather than buying a purpose-built motherboard. Unless I’m mistaken,
this isn’t currently possible — do you know if NVIDIA has any plans to implement
this in the future?
Thank you very much for this article and all of your helpful comments.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 208/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
could save a lot of money by going without NVLink. I am currently not aware of
any affordable NVLink hardware which is used outside of supercomputing. You
might get your hands on one of those machines, but it will be expensive. So in
the end CNTK might be the only way to go which is practical. This may be
disappointing, but I hope it helps!
Reply
Ervin says
2017-02-04 at 14:42
Hello Tim and thank you for your post. I have currently a desktop with Core 2 Quad
Q9300. I was wondering whether it would bottleneck a GTX1060 6GB for some
beginner to mid DL problems?
Reply
It is an old CPU, but you should be relatively fine. You can expect to run about
10-20% slower than with a high end CPU. Probably processing some non-deep
learning code, that is preprocessing data will take quite a bit more, but running
the deep learning model should be not much slower.
Reply
Ervin says
2017-02-11 at 22:22
Thank you for your reply. I also have an old motherboard GA-P43-ES3G
(http://www.gigabyte.com/Motherboard/GA-P43-ES3G-rev-10#sp) which
only supports PCI Express 2.0. I believe that will be a major bottleneck
right?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 209/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
om says
2017-02-02 at 08:18
Hi Tim,
I have the latest mac and I want to use GPU – GTX Titan X
with https://www.akitio.com/expansion/node
AKiTiO Node – eGPU box Thunderbolt 3
My question is, can I use TensorFlow with this external GPU device, without killing
performance and efficiency. What could be the side effect?
I know using TitanX with the desktop will be a lot better but I need mobility.
Reply
JP says
2017-02-02 at 15:55
Reply
om says
2017-02-10 at 03:20
Hello trulia.
Reply
Usher says
2017-01-20 at 11:41
Reply
Usher says
2017-01-17 at 02:39
Hi Tim,
Could you comment on below build?
I am not sure if the board is a good choice if I might be adding a second GPU in
the future. Or maybe ASUS X99-Deluxe II is worth the extra cost?
Reply
Hi Usher,
I do not have time to check the details, but it seems that the motherboard is
okay. The review on newegg are not that good though, but the
cost/performance might still be good. Adding a second GPU will definitely no
problem with the motherboard that you chose.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 211/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Otherwise the build looks okay. I recommend checking the build with
pcpartpicker, which often finds compatibility issues if there are any.
Reply
Pavel says
2017-01-07 at 00:31
Hi Tim,
Very good article! Thank you!
P.S. You have cool working place.
Reply
Nader says
2017-01-06 at 16:16
Reply
Michael says
2017-01-05 at 22:56
2. I’m deciding on which SSD to buy for my machine with four Pascal Titan X cards,
mostly to do training on Imagenet. Assuming your bandwidth estimate of 290MBps
is for a single card, should I multiply it by four when running a model on all four
cards? Do you know how fast Pascal Titan X processes a single 128 mini-batch?
Also, if I use mini-batch of 256 , I would need double the bandwidth, right?
Given the above considerations, would you recommend going with a PCIe based
SSD, such as Samsung 960 Pro, rather than SATA based one, such as Samsung 850
Evo?
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 212/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
1. The data used in deep learning is usually 32-bit or 4 bytes; this is the 4 in the
calculation above (conversion into bytes).
2. This is a bit complicated. Parallelism does not scale linearly, so that you
should multiply the estimate by 3.5 or so (for TensorFlow this will be closer to
2.5-3). One thing to keep in mind is that in practice small data transfers are
often slower (the overhead is large when the data size is small) and that GPUs
operate more efficient on larger batch sizes.
Reply
Michael says
2017-03-25 at 03:22
Thanks Tim. A different question: which software framework would you use
for experimenting with Imagenet?
So far I’ve been using Theano, but only on small datasets (MNIST and
CIFAR). My main interest is to test different quantization methods for
weights and activations, and see how it works for different network
architectures. I’ve read your paper, by the way, very interesting, but I prefer
not to code everything from scratch in C/CUDA if possible. Right now I’m
looking into implementation of the asynchronous batch allocation, like you
suggested, in Theano, and it’s not very straightforward.
Would you recommend switching to TensorFlow, or sticking with Theano?
I’m less concerned with the ability to parallelize code across multiple GPU,
because I can just run different experiments in parallel.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 213/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Nader says
2016-12-30 at 05:14
Hi,
What do you think of the following build ?
https://pcpartpicker.com/list/8sv2jc
Thank you
Reply
Pawel says
2016-12-29 at 12:57
Hi Tim,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 214/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I was suprised to see that my GTX 1070 peaks at ~1700 images / sec, a very small
improvement. It looks like the CPU is now the bottleneck (I see it constantly at 300%
usage – 3 full cores). I have a i5-3570k which should be decent.
I didn’t analysed it further (yet) but could samebody share an experience on that ? I
wasn’t expecting the CPU to be the bottleneck here.
Reply
Reply
Paweł,
Tensorflow do not use GPU to GPU transfer when updating weights. It
download the whole model to RAM and make updates on CPU. At least this is
the understanding i’ve got from reading: https://arxiv.org/abs/1608.07249
Reply
Nader says
2016-12-11 at 16:10
Should i buy a GTX 1080 now or wait the ti which is supposedly coming out next
month?
Reply
2016-12-13 at 12:36
The GTX 1080 Ti will be better in all of the ways. Make sure however to
preorder it or something, otherwise all cards might be bought up quickly and
you have to go back to the GTX 1080. Another strategy might be to wait a bit
longer for the GTX 1080 Ti to arrive and then buy a cheap GTX 1080 from eBay.
I think these two choices make sense if you can wait for a month or two.
Reply
Nader says
2016-12-30 at 05:16
Hi,
What do you think of the following build ?
https://pcpartpicker.com/list/8sv2jc
Thank you
Reply
Nader says
2016-12-30 at 05:16
Reply
Andrew says
2017-02-10 at 08:44
First off, if you are going to spend $85 on a 256GB regular SATA based
SSD for storage then you might as well get the top of the line M.2 960
Evo for $120. It’s over 3 times faster than the one you picked in transfer
speeds, and is overall much better. (alternatively if you don’t care about
the extra speed you can get a 500GB SATA drive for about that same
price, getting double the storage)
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 216/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Lastly, you should also get the i5 7600K instead of the i5 6600K since
Kaby Lake 7000 processors are about 5-10% faster than Skylake 6000
processors, and the 7600K can be overclocked to over 5ghz no
problem compared to the 6600K that has trouble getting over ~4.7ghz
on air cooling in some cases. And since the 7600K is also about the
same price you might as well get it. Personally though i would still
recommend an i7 over an i5 in this situation simply because
simultaneous multi-threading is becoming fairly more important as of
late, and the extra 2MB of L3 cache is also nice to have. I figure if you
are spending $1200 on a TITAN X Pascal you should be able to fit in
$100 more for an i7 7700K that can also be overclocked to 5ghz pretty
easily in most cases (even on air!)
Reply
Hi Tim, thanks for the excellent posts, and keep up the good work.
I am just beginning to experiment with deep learning and I’m interested in
generative models like RNNs (probably models like LSTMs, I think). I can’t spend
more than $2k (maybe up to $2.3k), so I think I will have to go with a 16-lane CPU.
Then I have a choice of either a single Titan X Pascal or two 1080s. (Alternatively, I
could buy a 40-lane CPU, preserving upgradability, but then I could only buy a
single 1080). Do you have any advice specific to RNNs in this situation? Is model
parallelism a viable option for RNNs in general and LSTMs in particular?
Thank you!
Reply
I think you can apply 75% of state-of-the-art LSTM models on different tasks
with a GTX 1080; for the other 25% you can often create a “smarter”
architecture which uses less memory and achieves comparable results. So I
think you should go for 16 lanes and two GTX 1080. Make sure your CPU
support two GPUs in a 8x/8x setting.
Reply
Om says
2016-12-05 at 04:13
Hi Tim,
http://www.costco.com/CyberpowerPC-SLC2400C-Desktop—Intel-Core-i7—8GB-
NVIDIA-GeForce-GTX-1080-Graphics—Windows-10-
Professional.product.100296640.html
CyberpowerPC SLC2400C Desktop – Intel Core i7 – 8GB NVIDIA GeForce GTX 1080
Graphics – Windows 10 Professional
My question is can i use “Titan Pascal X” from Nvidia along with GeForce GTX 1080
for more computation power.
I learned SLI is not solution and anyways both are different GPUs .
So in order to achieve faster result can i combine both GPU for Tensorflow.
I am using tensorflow –
I just found this – (Basic Multi GPU Computation in TensorFlow)
https://tensorhub.com/donnemartin/4_multi_gpu
Thanks
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 218/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Om,
I am really glad that you found the resources of my website useful — thank you
for your kind words!
The thing with the NVIDIA Titan X (Pascal) and the GTX 1080 is that they use
different chips which cannot communicate in parallel. So you would be unable
to parallelize a model on these two GPUs. However, you would be able to run
different models on each GPU, or you could get another GTX 1080 and
parallelize on those GPUs.
Note that using a Ubuntu VM can cause some problems with GPU support. The
last time I checked it was hardly possible to get GPU acceleration running
through a VM, but things might have changed since then. So I urge you to
check if this is possible first before you go along this route.
Best,
Tim
Reply
Gordon says
2016-12-01 at 13:21
Thank you very much for writing this! – knowing something about how to evaluate
the hardware is something I have been struggling to get my head around.
I have been playing with TensorFlow on the CPU on a pretty nice laptop (fast i7 with
lots of RAM and an SSD but ultimately dual core so slow as hell).
I want try something on the GPU to see if it is really just 100’s of times faster, but I
am worried about investing too much too soon as I have not had a desktop in
ages.. having read this post and the comments I have the following plan:
Use an existing freenas server I have as a test bed and buy a relatively low end GPU
– GTX 960 4096MB:
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 219/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
https://www.overclockers.co.uk/msi-geforce-gtx-960-4096mb-gddr5-pci-express-
graphics-card-gtx-960-4gd5t-oc-gx-319-ms.html
The freenas box has a crappy celeron core 2 3.2 dual core and only 8GB of Ram.:
http://ark.intel.com/products/53418/Intel-Celeron-Processor-G550-2M-Cache-
2_60-GHz
I will buy the graphics card and an SSD to install an alternative OS on, I *may*
upgrade the ram and processor too as all of these items will all benefit the freenas
box anyway (i also run plex on it).
If this goes well and I develop further I will look at a whole new setup later with
appropriate motherboard, cpu, etc. but in the mean time i can learn how to to
identify where my specific bottle necks are likely to be etc.
From what you have said here i think there will be several slow parts to my system
but I am probably going to get 80-90% of the speed of the graphics, the main
restriction being that the cpu only supports PCIe 2.0 – as everything else while not
ideal and scale-able for that GPU can probably feed it fast enough.
I have 2 questions (if you have time – sorry for long comment but i wanted to make
my situation clear):
2. I chose the GPU based on RAM, number of CUDA cores and Nvidia compute
capability rating (which reminds me of windows performance rating – a bit
vague but better than nothing).. the other one i was considering was this £13 more
so also a fine price imho:
https://www.overclockers.co.uk/palit-geforce-gtx-1050ti-stormx-4096mb-pci-
express-gddr5-graphics-card-gx-03t-pl.html
Which has less cores 768 vs 1024 but a shorter process length, higher speed
1290MHz vs 1178MHz, and i *think* i higher rating assuming that the Ti is just better
(seems to mean unlocked) 6.1 vs 5.2:
https://developer.nvidia.com/cuda-gpus#collapse4
Basically is the drop in cores really made up for to such a drastic extent that this
significantly higher rating from nvidia is accurate.. noting that i am probably going
to be happy enough either way – feel free to just say “either is probably fine”
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 220/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Alternatively if there is something else in the sub £150 ish range that you would
suggest given that the whole thing may be replaced by a titan x or similar
(hopefully cheaper after Christmas ) if this goes well. I did consider just getting
something like this: much less ram but still was more cores than 2 and allows me to
figure out how to get code running on the GPU:
https://www.overclockers.co.uk/asus-geforce-gt-710-silent-1024mb-gddr3-pci-
express-graphics-card-gx-396-as.html
Reply
Gordon says
2016-12-02 at 10:49
Got the 1050 Ti (well another variation of it), i figured they would be similar
regardless so i might as well trust nvidias rating.
https://www.amazon.co.uk/gp/product/B01M66IJ55/
Also got 32 GB or ram and a quadcore i5 that supports pci 3.0 as they were all
cheap on ebay. (SSD too of course).
Looks like i can mount my zfs pool in ubuntu so i will probably just take freenas
offline for a while and use this as a file and plex server too (very few users
anyways) and this way my raid array will be local should i want to use it.
Reply
That sounds solid. With that you should easily get started with deep
learning. The setup sounds good if you want to try out some deep learning
on Kaggle.com for example.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 221/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Upgrading the system bit by bit may make sense. Note that CPU and RAM will
make no difference to deep learning performance, but might be interesting for
other applications. If you only use one GPU a PCIe 2.0 will be fine and will not
hurt performance. The GTX 960 and GTX 1050Ti are on a par in terms of
performance. So pick what is most convenient / cheaper for you.
Reply
Mor says
2016-11-28 at 19:16
Hi Tim,
I am willing to buy a full hardware to deep learning,
my budget is about 15,000$
I don’t have any experience in this and when I tried to check things out it was too
complicated for me to understand,
Can you help me ? maybe recommend about companies or anything else that suits
my budget and still be good enough to work with?
Thanks a lot
Reply
Reply
JP Colomer says
2016-11-24 at 22:47
Hi Tim,
Thank you for this excellent guide.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 222/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I was wondering, now that the new 1000 series and Titan X came out, what are your
updated suggestions for GPUs (no money, best performance, etc)?
Reply
Reply
JP Colomer says
2016-12-05 at 07:19
Reply
Reply
Shahid says
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 223/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2016-11-10 at 10:51
I am confused between two options:
1) A 2nd Generation core i5, 8GB DDR3 RAM and a GTX 960 for $350.
2) A 6th Generation core i3, 16GB DDR3 RAM and a GTX 750Ti for $480.
Can you please comment? I expect to upgrade my GPU after a few months.
Reply
A difficult choice. If your upgrade your GPU in a few months then it depends if
you use your desktop only for deep learning or also for other tasks. If you use
your machine regularly, I would spend the extra money and go for option (2). If
you want to almost exclusively deep learning with the machine (1) is a good,
cheap choice. Here the choice also depends if you buy the 2GB or 4GB variant
of each GPU. In terms of speed (1) will be about 33-50% faster, but the speed
would not be too important when you start out with deep learning, specially if
you upgrade the GPU eventually.
Reply
Shahid says
2016-11-11 at 06:56
Thank you Tim, you really inspire me! Actually I took the Udacity SDCND
course, and here is the list of a few projects I want to accomplish on a local
machine:
So, my work is solely related to Computer Vision and Deep Learning. I also
have an option to a GTX 1060 6GB with that core i3 (2). Off course, I expect
to code the GPU versions of OpenCV tasks. Do you think this 3rd option
would be sufficient to accomplish these projects in average amount of
time? Thank you again.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 224/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Shahid. I’m in the same boat as yours. Even I have signed up for the
SDCND. I have an old PC with core i3 and 2GB RAM. I am adding
additional 8GB RAM and buying GTX 1060 6GB. This is a really
powerful GPU which’ll perform great in our work associated with the
SDCND.
Reply
Reply
Reply
Hi Tim ,
Wonderful article . However I am about to buy a new laptop . So what do you feel
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 225/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
about the idea of gaming laptop for deep learning with Nvidia GTX 980 M , GTX
1060/1070 ?
Reply
Definitely go for the GTX 10 series GPUs for your laptop since these are very
similar to full desktop GPUs. They are probably more expensive though.
Another option would be to buy a cheap, light laptop with long-battery
duration and a separate desktop to which your connect remotely to run your
deep learning work. The last option is what I use and I am quite fond of it.
Reply
Alisher says
2016-11-08 at 06:42
I am very happy that I thought as you did. I bought Macbook Air which is
very portable, and going to buy a desktop with better specifications to do
my experiments on it.
Reply
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 226/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
panovr says
2016-11-01 at 02:53
Reply
Reply
Hi Tim,
Thank you for sharing your knowledge it was very much beneficial to understand
the concepts in DL.
I have a doubt
How to feed custom images into CNN , for object recognition using Python
language.Please give some pointers on this.
Reply
You will need to rescale custom images to a specific size so that you can feed
your data into a CNN. I recommend looking at ImageNet examples of common
libraries (Torch7, Tensorflow) to understand the data loading process. You will
then need to write an extension which resizes your images to the proper
dimension, for example 1080×1920 -> 224×224.
Reply
Alisher says
2016-11-08 at 06:36
Firstly, I am very thankful for your post. It is very nice and very helpful.
One thing I wanted to point is; you can feed the images into network (in
caffe) as they are. I mean if you have 1080×1920 image, there is no need to
reshape it to 224×224. But, this does not mean that feeding the image as
is perform better, I think this can be standalone research topic
Regards,
Reply
Reply
Shahid says
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 228/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2016-11-12 at 06:45
Thank you Tim!
This is good overview on the HW that matters to the DL, Would like your view on
the OpenPower -NVIDIA combo, and economics of setting up a ML/DL lab.
Reply
Reply
Arthur says
2016-10-24 at 22:24
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 229/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hey, I wanted to ask if the nvidia quadro k4000 will be a good choice for running
convolutional nets?
Reply
A K4000 will work, but it will be slow and you cannot run big models on large
datasets such as ImageNet.
Reply
Reply
Ashiq says
2016-10-19 at 19:06
Hi Tim
Thanks for the great article and your patience to answer all the questions. I just built
a dev box with 4 Titan X Pascal and need some advice on air flow. For reference,
here is the Part list: https://pcpartpicker.com/list/W2PzvV and the
Picture: http://imgur.com/bGoGVXu
Loaded Windows first for stress testing the components and noticed the GPUs
temps reached 84C while the fans are still at 50%. Then the GPUs started slowing
down to lower/maintain the temp. Then with MSI Afterburner, I could specify a
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 230/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
custom temp-vs-fanspeed profile and keep the GPU temps at 77C or below –
pretty much what you wrote in the cooling section above.
There is no “Afterburner” for Linux, and apparently the BIOS of the Titan X Pascal is
locked so we can’t flash them with custom temp setting. The only option left for me
is to play with the coolbits and I prefer not to attach 4 monitors to it (I already have
two 30inch monitors that are attached to a windows computer that I use for
everything. 6 monitors on the table will be too much).
I wonder if you found any new way of emulating monitors for Xorg as my preferred
option would be keep 3 of the GPUs headless ?
Cheers
Ashiq
Reply
I did not succeed in emulating monitors so myself. Some other claim that they
got it working. I think the easiest way to increase the fan speed would be to
flash the GPU with a custom BIOs. That way it will work in both Windows and
Linux.
Reply
spuddler says
2016-10-26 at 15:34
Not sure, but there maybe there exist specific dummy plugs to help
“emulating” monitors, if it’s not possible purely by software. At least DVI and
HDMI-dummy plugs worked for cryptocurrency miners back in the day.
Reply
Ashiq says
2016-10-27 at 03:41
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 231/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
file (/etc/X11/xorg.conf ) and I can change all 4 fan speeds with nvidia-
settings
Reply
Thanks Ashiq — that sounds great! Thank you for sharing the link!
Reply
Hi Ashiq,
Would you mind sharing how loud is your setup. It look very similar to the one
I’m planning to build and I’m torn between going for liquid cooling or air
cooling. Will I be able hear it from 10 meters away?
Regards,
Piotr
Reply
anon says
2016-10-19 at 18:07
Hi Tim,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 232/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
anon says
2016-10-02 at 01:09
Hi Tim,
Reply
The Quadro 5000 has only a compute capability of 2.0 and thus will not work
with most deep learning libraries that use cuDNN. Thus it might be better to
upgrade.
Reply
anon says
2016-10-04 at 19:59
Thanks.
Reply
2016-11-21 at 16:40
That should matter much. Don’t go with the Nvidia founder’s edition. It
doesn’t have a good cooling system. Just go with the cheapest one
which is EVGA. It is one of the most promising brand. I just ordered the
EVGA one.
Reply
Please note that the GTX 1080 EVGA has currently cooling
problems with are only fixed with flashing the BIOS of the GPU.
This card may begin to burn without this BIOS update.
My current CPU is Intel Core i3 2100 @ 3.1Ghz and RAM is 4GB. My motherboard is
Gigabyte GA-H61M-S2P-B3 (rev. 1.0) . It has support PCIe 2.0. Can I use GTX 1060 in
my current configuration or do I need to change the board and the CPU? I want to
keep the cost as much as low.
Reply
You should be able to run a GTX 1060 just fine. The performance should be
only 5-10% less than on an optimal system.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 234/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Awsome! Thanks for your sharing. Can you tell me how much will them cost to
build up such a cluster? Cheers!
Reply
Basically it is two regular deep learning systems together with infiniband cards.
You can get infiniband card and a cable quite cheap on eBay and the total cost
for a 6 GPU, 2 node system would be about 3k for the system and infiniband
cards, and an additional 6k for the GPUs (if you use Pascal GTX Titan X) for a
total of $9k.
Reply
Shravankumar says
2016-09-21 at 10:45
I am using Asus K55VJ, i5 3rd gen, Nvidea Geforce GT 635M- 2GB, with 750HDD
and 8GB RAM. Does my computer supports deep learning?
Reply
Your GPU has compute capability of 2.1 and you need at least 3.0 for most
libraries — so no, your computer does not support deep learning on GPUs.
You could still run deep learning code on the CPU, but it would be quite slow.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 235/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Jacqueline says
2016-09-16 at 07:04
Hi Tim
https://www.bhphotovideo.com/c/product/1269213-
REG/asus_g20cb_db71_gtx1070_republic_of_gamers_g20cb.html
thank you!
Reply
It is a bit pricey and there are not much details about the motherboard. Also
the GPU might be a bit weak for researchers.
I would also encourage you to buy components and build them together on
your own. This may seem like a daunting task but it is much easier than it
seems. This way you get a high-quality machine that is cheap at the same time.
Reply
Gilberto says
2016-09-08 at 09:39
Hi Tim,
first of all thank you for sharing all these precious information.
Current configuration:
– Motherboard: Gigabyte GA-P55A-UD3 (specification
at: http://www.gigabyte.com/products/product-page.aspx?pid=3439#sp)
– Intel i5 2.93 GHz
– 8 Gb Ram
– GTX 980
– PSU power: 550watts
I may add:
– Ssd Hard Drive (I will install Ubuntu and use it only by command line – not
graphical interface)
The motherboard should work, but it will be a bit slower. The PSU is borderline,
it might be a bit too few watts or just right, its hard to tell.
Reply
Arman says
2016-08-26 at 17:49
Hi Tim,
I had a question about the new pascal gpu’s. I am debating between Gtx 1080 and
Titan X. The price of Titan X is almost double the 1080’s. Excluding the fact that
Titan X has 4 more Gb memory, does it provide significant speed improvement
over 1080 to justify the price difference?
Thanks,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 237/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Juan says
2016-09-05 at 00:24
Hi,
I am not Tim (obviously), but as far as I understood from his other post on GPU
(http://timdettmers.com/2014/08/14/which-gpu-for-deep-learning/) he states
that for research level of work it actually is a difference, maxime when you are
are using videosets. But for example … “While 12GB of memory are essential for
state-of-the-art results on ImageNet on a similar dataset with 112x112x3
dimensions we might get state-of-the-art results with just 4-6GB of memory.”
Reply
DarkIdeals says
2016-09-10 at 05:55
If you can afford it the TITAN X is DEFINITELY worth it over the 1080 in most
cases. Not only does it have that 12GB of VRAM to work with but it also has
features like INT8 (the way i understand it, is that you can store floats as 8 bit
integers which helps efficiency etc.. Potentially quite useful) and has 44 TOP
units (kinda like ROPs but not for graphic rendering, they are beneficial to Deep
Learning though)
Basically the TITAN X is literally identical to the $7000 Tesla P100 just without the
Double Precision FP64 capability and without HBM2 memory (The TITAN X
uses GDDR5X instead, however it’s not much of a difference as the P100’s
memory bandwidth even with the HBM is only 540 GB/second whereas the
TITAN X is very close at 480 GB/second and hits 530 GB/second when you
overclock the memory from 10,000mhz to 11,000mhz so it’s literally no
difference really) Other than those things and the certified Tesla Drivers there’s
literally no real difference between the P100 and the TITAN X Pascal; which is
very important as the Tesla P100 is literally THE most powerful graphic card on
the planet right now!
The important thing to mention is that Double Precision isn’t really important
for Neural nets etc.. that you deal with in Deep Learning; so for $1,200 you are
getting the power of the $7,000 monster supercomputer chip of the Tesla P100
just without all the unnecessary server features that Deep Learning doesn’t use.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 238/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Also, in comparison to the GTX 1080, the TITAN X has a significant advantage in
both memory capacity (12GB vs 8GB on 1080), memory bandwidth (530 GB/s
when overclocked on the TITAN X, vs 350 GB/s on the 1080 when overclocked…
that’s a FIFTY PERCENT increase in memory bandwidth!), and has a massive
increase in CUDA cores which is very beneficial (40% more, which when
combined with the double memory capacity and 50% higher bandwidth easily
nets you ~60% more performance in some scenarios over the 1080)
Hope this helps, the TITAN X is a GREAT chip for Deep Learning, the best in the
world currently available in my opinion. Which is why i bought two of them.
Reply
DarkIdeals says
2016-11-09 at 01:32
(sorry for the long post but it is important to your decision so try to read it all if
you have time)
Hey, correcting an error in my earlier post. LIke i said i wasn’t quite sure if i
understood the INT8 functionality properly. and i was wrong about it.
Apparently there was a typo in the spec pages of the Pascal TITAN X, it said “44
TOPs” and made me think it was an operation pipeline of sorts similar to a
“ROP” which is responsible for displaying graphical images etc..
It actually was referring the the INT8, which is basically just 8 bit integer
support. The average GPU runs with 32 bit “full precision” accuracy, which is a
measurement of how much time and effort is put into each “calculation” made
by the GPU. For example, with 32 bit it may only go out to 4 decimal points
when calculating for the physics of water in a 3d render etc.. which is plenty
good for things like Video Games and your average video editing and
rendering project; but for things like advanced physics calculations by big
universities that are trying to determine the 100% accurate behavior of each
individual molecule of H2O within the body of water to see EXACTLY how it
moves when wind blows etc.. you would need “double precision” which is a 64
bit calculation that would have much more accuracy, going to more decimal
points before deciding that the calculation is “close enough” compared to what
32 bit would.
Only special cards like Quadro’s and Tesla’s have high 64 performance, they
usually have half the Teraflops of performance at 64 bit mode compared to 32
bit, so a Quadro P6000 (same GPU as the TITAN XP but with full 64 bit support)
it has 12 Teraflops of power at 32 bit mode and ~6 Teraflops of power in 64 bit
mode. But there is also 16 bit mode, “half precision” for things requiring even
less accuracy, INT8 to my understanding is basically “8 bit quarter precision”
mode, with even less focus on total mathematical accuracy; and this is useful
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 239/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
for Deep Learning as some of the work done doesn’t require that much
accuracy,.
So, in other words, in 8 bit mode, the TITAN X has “44 Teraflops” of
performance.
Reply
Your analysis is very much correct. However, for some games there are
already some elements which make heavy use of 8-bit Integers. However,
before it was not possible to do 8-bit Integer computation, but you had to
first convert both numbers to 32-bit, then do the computation, and then
convert it back. This would be done implicitly by the GPU so that no
programming was necessary. Now the GPU is able to do it on its own.
However, the support is still quite limited so you will not the 8-bit deep
learning just yet. Probably in a year earliest would be my guess, but I am
sure it will arrive at some point.
Reply
Reply
I am not sure how easy it is to upgrade the GPU in the laptop. If it is difficult,
this might be one reason to go with the better GPU since you will probably also
have it for many years. If it is easy to change, then there is not really a
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 240/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
right/wrong choice. It all comes down to preference, what you want to do and
how much money you have for your hardware and for your future hardware.
Reply
sk06 says
2016-08-17 at 12:54
Hi,
I just bought two Supermicro 7048GR-TR server machine with 4 TitanX cards on
each machine. Im confused how to configure the server. How many partitions I
have to make, how to utilize 256GB SSD drive and two other 4TB hard drives in
each machine. The server will be only used for deep learning applications. What
deep learning framework should I use (TensorFlow or Caffe or Torch) considering
two servers. I work in medical imaging domain. I recently started getting used to
deep learning domain. Please help me with your valuable suggestions.
Reply
The servers have a slow interconnect, that is the servers only have a gigabit
Ethernet which is a bit too slow for parallelism. So you can focus on setting up
each server separately. It depends on your dataset size, but you might want to
have the SSD drive dedicated for your datasets, that is, install the OS on the
hard drive. If your datasets are < 200GB, you could also install the OS on the
SSD to have a smoother user experience. The frameworks all have their pros
and cons. In general I would recommend TensorFlow, since it has the fastest
growing community.
Reply
sk06 says
2016-08-24 at 05:10
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 241/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
I just started learning about neural networks and I’m looking forward to studying it.
I have a gt 620 with a dual core pentium g2020 clocked at 3.3 ghz with 8gb of ram.
Would it be better to buy a 1060 and two 8gb rams for the future?
Reply
Yes, the GT620 will not support cuDNN which is important deep learning
software and makes deep learning just more convenient, because it allows you
more freedom in choosing your deep learning framework. You will have less
troubles if you buy a GTX 1060. 16GB of RAM will be more than enough, I think
even 8GB could be okay. Your CPU will be sufficient, no update required.
Reply
Vasanth says
2016-08-11 at 17:22
Hi Tim,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 242/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Many thanks for this post, and your patient responses. I had a question to ask –
NVIDIA gave away Tesla K40C (which is the workstation version of K40, as I
understand) as part of its Hardware Grant Program (I think they are giving TitanX
now, but they were giving Tesla K40Cs until recently). It’s not clear to me what
workstations from standard OEMs like Dell/HP are compatible with a K40C. I have
spoken to a few vendors about compatibility issues, but I don’t seem to get
convincing responses with knowledge. I am concerned about buying a workstation,
which would later not be compatible with my GPU. Would it be possible for you to
share any pointers you may have?
The K40C should be compatible with any standard motherboard just fine. The
compatibility that hardware vendors stress if often assumed for datasets where
the cards run hot and need to do so permanently for many months or years.
The K40 has a standard PCIe connector and that is all that you need for your
server motherboard.
Reply
Wajahat says
2016-08-11 at 15:27
Hi Tim
I am using MatlabR2016a with MatConvNet 1.0 beta20 (Nvidia Quadro 410 GPU in
Win7 and GTX1080 in Ubuntu 16.04), Corei7 4770 and Corei7 4790.
Exactly same data with same network architecture used.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 243/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Best Regards
Wajahat
Reply
This can well be true and normal. The seed itself can produce different random
numbers on CPU and GPU if different algorithms are used. Convolution on
GPUs mal also include some non-deterministic operations (cuDNN 4). When
using unit tests to compare CPU and GPU computation, I also often have some
difference in output given the same input, thus I assume that there are also
small differences in floating point computation (although very small). All this
might add up to your result.
Reply
Arman says
2016-08-07 at 22:02
Reply
For a single Titan X Pascal and if you do not want to add another card later
almost any build will do. The CPU does not matter; you can buy the cheapest
RAM and should have at least 16 GB of it (24 GB will be more than enough). For
the PSU 600 watts will do; 500 watts might be sufficient. I would buy a SSD if
you want to train on large data sets or raw images that are read from disk.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 244/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
anon says
2016-08-05 at 20:52
Reply
Reply
Andrew says
2016-08-15 at 06:59
You are an amazingly good person Tim. The world needs more people like
you. Your actions encourage others to behave in a similar way which in
turn helps build better online and offline communities. Thank you!
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 245/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
How do the new NVIDIA 10xx compare? I followed through with this guide and
ended up getting a GTX Titan. The bandwidth looks slightly higher for the Titan
series. Does the architecture affect learning speeds?
Reply
The bandwidth is high for all Titans, but their performance is different from
architecture to architecture, for example Kepler (GTX Titan) is much slower than
Maxwell (GTX Titan X) even though the have comparable bandwidth. So yes
the architecture does affect learning speed — quite significantly so!
Reply
drh1 says
2016-07-31 at 04:50
hi tim,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 246/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
thanks for some really useful comments. i have a hardware question. i’ve configured
a Windows 10 machine for some GPU computing (not DL) at the moment. I think
the hardware issues overlap with your blog, so here goes:
the system has a GTX 980 Ti card and a K40 card on an ASUS X-99 Deluxe
motherboard. When the system boots up, the 980 (which runs the display as well) is
fine, but the K40 gives me “This device cannot start. (Code 10). Insufficient system
resources exist to complete the API”. I have the most up-to-date drivers (354.92 for
K40, 368.81 for 980).
Has anyone configured a system like this, and did they have similar problems? Any
ideas will be greatly appreciated.
Reply
It might well be that your GPU driver is meddling here. There are separate
drivers for Tesla and GTX GPUs and you have the GTX variant installed and thus
the Tesla card might not work properly. I am not entirely sure to go around this
problem. You might want to configure the system as a headless (no monitor)
server with Tesla drivers and connect to it using a laptop (you can use remote
desktop using Windows, but I would recommend installing ubuntu).
Reply
bmahak says
2016-07-26 at 01:18
I want to build my own deep learning machine using skylake motherboard and cpu.
I am planing not to use more then 2 GPUS (GTX 1080). Starting with one GPU first
and upgrading to a second one if needed.
here is my setup in
pcpartpiker: http://pcpartpicker.com/user/bmahak2005/saved/Yn9qqs
HB.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 247/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
The motherboard and CPU combo that you chose only supports 8x/8x speed
for the PCIe slots. This means you might see some slowdown in parallel
performance if you use both of your GPUs at the same time. The decrease
might vary between networks with roughly 0-10% performance loss. Otherwise
the build seems to be okay. Personally I would go with a bit more watts on the
PSU just to have a save buffer of extra watts.
Reply
Reply
There should be no problems with cooling for the GDDR5X memory with the
normal card layout and fans. I know for HBM2 NVIDIA actually designed the
memory to be actively cooled, but HBM2 is stacked while GDDR5X is not.
Generally GDDR5X is very similar to GDDR5 memory. It will consume less
power but also offer higher density, so that on the bottom line GDDR5X should
run on the same temperature level or only slightly hotter than GDDR5 memory
— no extra cooling required. Extra cooling makes sense if you want to
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 248/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
overclock the memory clockrate, but often you cannot get much more
performance out of it for how much you need to invest in cooling solutions.
Overall the architecture of Pascal seems quite solid. However, most features of
the series are a bit crippled due to manufacturing bottlenecks (16nm, GDDR5X,
HBM2 all these need their own factories). You can expect that the next line of
Pascal GPUs will step up the game by quite a bit. The GTX 11 series probably
will feature GDDR5X/HBM2 for all cards and allow full half-float precision
performance. So Pascal is good, but it will become much better next year.
Reply
Obviously will have to confirm the physical fit once those specs become
more available, but insofar as the approach, I was a little bit concerned
about the VRAM.
The use case is convolutional networks for image and video recognition.
Thanks,
Selly
Reply
sk06 says
2016-07-09 at 10:28
Hi Tim,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 249/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Thanks for the excellent post. The user comments are also pretty informative. Kudos
to all.
I recently started shifting my focus from conventional machine learning to Deep
Learning. I work in medical imaging domain and my application has a dataset of
50000 color images (5000 per class, 10 classes, size – 512×512). I have a system with
Quadro k620 gpu. I want to train state of the art CNN model architectures like
Googlenet InceptionV3, VGGnet16, alexnet from scratch. Do the QuadroK620 will
be sufficient for training these models. If I have to go for higher end gpu’s, can u
please suggest me which card I should go for? (K1080, TitianX, etc). I want to
generate the prototypes as fast as possible. Budget is not primary.
Reply
A QuadroK620 will not be sufficient for these tasks. Even with very small batch
sizes you will hit the limits pretty quickly. I recommend getting a Titan X on
eBay. Medical imaging is a field with high resolution images where any
additional amount of memory can make a good difference. Your dataset is
fairly small though and probably represents a quite difficult task; it might be
good to split up the images to get more samples and thus better results
(quarter them for example if the label information is still valid for these images)
which then in turn would consume more memory. A GTX Titan X should be
best for you.
Reply
John says
2016-07-07 at 23:19
Great article. What would you recommend for a laptop GPU setup rather than a
desktop? I see a lot of laptop builds with a 980M or 970M GPU, but is it worth
waiting for some variant of the 1080M/1070M/1060M?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 250/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
A laptop with such a high end graphics card is a huge investment and you will
probably use that laptop much longer than people use their desktops (it is
much easier to sell your GPU and upgrade for a desktop). I would thus
recommend to wait for the 1000M series. It seems it will arrive in some months
ahead and the first performance figures show that they are slightly faster than
the GTX Titan X — that would be well worth the wait in my opinion!
Reply
Dante says
2016-07-07 at 20:21
Tim,
Based on your guide I gather that choosing a less expensive hexa core Xeon cpu
with either 28 or 40 lanes will not see a great drop in performance. is that correct?
(1-2 GPUs). Can you share your thoughts?
Great guides. very helpful for folks getting into Deep learning and trying to figure
out what works best for their budget.
Dante
Reply
Yes that is very true. There is basically no advantage from newer CPUs in terms
of performance. The only reason really to buy a newer CPU is to have DDR4
support, which comes in handy sometimes for non-deep learning work.
Reply
Simon says
2016-07-07 at 12:34
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 251/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Simon says
2016-07-04 at 15:49
Hi
Asus spec X99-E WS shows that has a PLX chip that provides a additional 48 PCIe
lanes. Getting a i7-6850K with a X99-E WS theoretically gives you 88PCIe lanes in
total and that is still plenty to run 4 GPUs all at x16.
Is that true for deep learning ?
Thx for reply.
Reply
I am not exactly sure how this feature maps to the CPU and to software
compatibility. From what I heard so far, you can quite reliably access GPUs from
very non-standard hardware setups, but I am not so sure about if the software
would support such a feature. If the GPUs are not aware of each other on the
CUDA level due to the PLX chip, then this feature will do nothing good for
deep learning (it would probably be even slower than a normal board, because
probably you would need to go through the CPU to communicate between
GPUs).
But the idea of a PLX chip is quite interesting, so if you are able to find out
more information about software compatibility, then please leave a comment
here — that would not only help you and me, but also all these other people
that read this blog post!
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 252/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Thank you for an excellent post, I keep coming back here for reference.
With regards to memory types, what role does GDDR5 vs GDDR5X play? Is this an
important differentiator between offerings like 1080 and 1070, or is it not relevant
for deep learning?
Reply
gameeducationjournal.info says
2016-06-24 at 22:16
Reply
That sounds awful. I will check what is going wrong there. However, I am unable
to remove a single user from the subscription. See if you can unsubscribe
yourself. Otherwise please contact the jetpack team. Apparently the data is
stored by them and the plugin that I use for this blog access that data as you
can read here. I hope that will help you. Thanks for letting me know.
Reply
Reply
Hey,
first of all thanks for the guide, helped me immensely to get some clarity in this
puzzle!
Couple of questions as I’m a bit too impatient to wait for 1080/70 reviews on this
topic:
As you stated, bandwidth, memory clock and memory size seem to be one of the
most important factors so would it even make sense to put some more money in a
solidly overclocked custom GPU? So far I’ll just pick the cheapest solidly cooled one
(EVGA ACX 3.0 probably).
Also my initial analysis between 1070GTX vs 1080GTX was heavily in favor for the
1080 GTX based on the benchmarks from http://www.phoronix.com/scan.php?
page=article&item=nvidia-gtx-1070&num=4 . Though the theoretical TFLOPS SP
MIXBENCH results were closely in favor for the 1070 (76.6 €/TFLOP 1080GTX vs 73.9
€/TFLOP 1070GTX) the SHOC on CUDA results in terms of price efficiency were
closely in favor for the 1080GTX but more or less the same . However the GDDRX5
on the 1080 GTX seem to seal the deal I guess for deep learning applications? Also I
found the 1080 around 6 Watt/TFlops more cost efficient. Am I on the right track
here? Maybe the numbers help some others here searching for opinions on that :).
Anyways after reading through your articles and some others I came up with this
build:
http://pcpartpicker.com/list/LxJ6hq . Some comments would be very appreciated
. I feel like the CPU is a bit overkill but it was the cheapest with DDR4 ram and
40 lanes. Maybe not needed though I’m a bit unsure of that.
Best regards
Reply
peter says
2016-06-19 at 23:15
Hello Tim:
Thanks for the great post. I built the following PC base on it.
CPU: i5 6600
Mother board: Z170-p
DDR4: 16g
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 254/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
However, after I install 14.04, I can’t get CUDA8.0 and the new driver install(which
claim N1080 user has to renew this driver).
Is the problem occur because of the other components of the PC like mother
board?
Thanks!
Reply
I have heard that people have problems with Skylake under ubuntu 14.04. But I
am not sure if that is really the problem. You can try upgrade to ubuntu 16.04
because the Skylake support is better under that version, but I am not sure if
that will help.
Reply
Hi Tim Dettmers,
Your blog is awesome. I currently have GeForce GTX 970 on my system , is that
sufficient for beginning Convolutional Neural Networks.
Reply
A GTX 970 is an excellent option to explore deep learning. You will not be able
to train the very largest models, but that is also not something you want to do
when you explore. It is mostly learning how to train small networks on common
and easy problems, such as AlexNet and similar convolutional nets on MNIST,
CIFAR10 and other small data sets, until you get a “feel” for training
convolutions nets so that you then can go on with larger models and larger
data sets (ResNet on ImageNet for example). So everything is good.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 255/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
I haven’t bee able to boot up this MSI laptop with any of the flavors of 14.04
(lubuntu, xubuntu, kubuntu, ubuntu) , could it be the SkyLake processor that it is
not compatible with 14.04?
https://bugzilla.kernel.org/show_bug.cgi?id=109081
Looks like I will have to wait until a fix is created for the upstream ubuntu versions
or until nvidia updates Cuda to support 16.04. Is there any other thing I can try?
Thanks!
Reply
Laptops with a NVIDIA GPU in combination with Linux are always a pain to get
running properly as it is often is also very dependent on your other hardware in
your laptop. I do not have any experience in this case, but you might be able to
install 14.04 and then try to patch the kernel with that you need. Not easy to do
though.
Reply
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 256/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
http://www.rle.mit.edu/eems/wp-
content/uploads/2016/02/eyeriss_isscc_2016_slides.pdf
Reply
Glenn says
2016-06-16 at 00:30
Thanks for all the info. If I plan to use only one GPU for computation, then would I
expect to need two GPUs in my system: one for computation and another for
driving a couple of displays? Or can a single GPU be used for both jobs?
Reply
A single GPU is fine for both. A monitor will use about 100-300MB of your GPU
memory and usually draw an insignificant amount (<2%) of performance. It is
also the easier option, so I would just recommend to use a single GPU.
Reply
Yasumi says
2016-06-15 at 13:26
For deep learning on speech recognition, what do you think of the following specs?
It’s going to cost 2928USD. What are your thoughts on this?
– INTEL CORE I7-6800K UNLOCKED FOR OC(28lanes)(6 CORE/ 12
THREADS/3.8GHZ) NEW!
– XSPC RayStorm D5 Photon AX240 (240L)
– ASUS X99-E WS (ATX/4way SLI/8x Sata3/2xGigabit LAN/10xUSB3.0)
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 257/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
This is a good build for a general computation machine. A bit expensive for
deep learning, as the performance is mostly determined by the GPU. Using
more GPUs and cheaper CPU/Motherboard/RAM would be better for deep
learning, but I guess you want to use the PC also for something different than
deep learning :). This would be a good PC for kaggle competitions. If you plan
on running very big models (like doing research) then I would recommend a
GTX Titan X for memory reasons.
Reply
thanks so much for your advice! I managed to install Xubuntu 16.04, now the next
step is installing CUDA and TensorFlow, I will need all the advice that I can get with
that one.
The problem I have with Ubuntu Desktop is known, it looks like they are going to
address it in 14.04.1 (sorry for the comment slightly off topic).
http://askubuntu.com/questions/760051/ubuntu-16-04-0-final-unity-desktop-
kubuntu-gnome-can-not-boot-from-live-us/760124
Reply
Spuddler says
2016-06-14 at 15:50
You should try to use 14.04, 16.04 still can give you lots of headaches right now.
PS: I tried to install ubuntu (all it s versions) and it fails to show the gnome menu, it
just shows the background desktop image.
Reply
Spuddler says
2016-06-12 at 21:48
as far as I know, quadro cards are usually optimized for CAD applications, you
can use them for deep learning but they will not be as cost efficient as regular
geforce cards.
Your problem with Ubuntu not booting is a strange one, does not really look
like a graphics driver issue since you get a screen. Before googling for more
difficult troubleshooting procedures I would try other Ubuntu 14.04 LTS flavours
if I were you, like Xubuntu (windows-like, lightweight), Kubuntu (windows-like,
fancy) or even Lubuntu (very lightweight). It may just be some arcane issue with
Ubuntu’s Gnome Desktop and your hardware.
Reply
Nizam says
2016-06-10 at 11:42
This is the most informative blog about building a deep learning blog!
Thanks for that.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 259/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Now that the Nvidia’s 1080, 1070 are launched, which is a better deal for us?
two 1070s or one 1 080?
Question: For budgetary reasons i’m looking at an AMD cpu / board combination
(4 cores) but that combination has no onboard video.
Can the GPU (4GB nvidia 960) which will be used for machine learning also be used
at the same time as the videocard (no 3d offcoarse).
Does that work or do i need an extra videocard ? Thanks!
Reply
Yes, that will work just fine! This setup would be a great setup to get started
with deep learning and get a feel for it.
Reply
Tim,
I’m looking for information on which GPU cards have support for convolutional
layers, in particular I was considering a laptop with the GTX 970, but according to
your blog above it does not support convolutional nets. Would you ind to explain
what does that mean in terms of features and also time performance? Is there a
way to know from the spec whether the card is good for conv nets?
thanks in advance
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 260/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Maybe I have been a bit unclear in my post. The GTX 970 supports
convolutional nets just fine, but if you use more then 3.5GB of memory you will
be slowed down. If you use 16-bit networks though you can still train relatively
well sized networks. So a GTX 970 is okay for most non-research, non-I-want-
to-get-into-top5-kaggle use-cases.
Reply
Greg says
2016-05-28 at 08:35
Hey Tim…quick question. Do you have any opinion about the new GeForce GTX
1080s for deep learning?
Maybe you already give your opinion but I have missed it.
Thanks,
Greg
Reply
Thomas R says
2016-05-19 at 15:24
Hi Tim, did you connect your 3 monitors to the mainboard/CPU or to your GPU?
Does this have an influence on the deep learning computation?
Reply
2016-05-26 at 11:08
I connected them to two GPUs. It does not really affect performance (maybe 1-
3% at most), but it does take up some memory (200-500MB). But overall this
effect is neglectable.
Reply
DD Sharma says
2016-05-13 at 15:15
Hello Tim,
Comparing two cards for GPGPU (Deep Learning being an instance of a GPGPU)
what is more important: # of cores or memory? For learning purposes and may be
some model dev I am considering a low end card (512 cores, 2GB) .. will this
seriously cripple me? Other than giving-up performance gains, will it seriously be
constraining? I checked research work of folks from 5+ years ago and many in
academia used processors with even weaker specs and still got something done.
Once I discover that I am doing something real serious I can go to Amazon cloud
or get an external GPU (connect via Thunderbolt 3) or build a machine.
Reply
Neither cores nor memory is important per se. Cores do not matter really.
Bandwidth is important and FLOPS second most important. You need a certain
memory to training certain networks. For state of the art models you should
have more than 6GB of memory.
Reply
Hi Tim,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 262/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I suppose this is echoing Jeremy’s question, but is there any reason to prefer a Titan
X to a GTX 1080 or 1070? The only spec where the Titan X still seems to perform
better is in memory (12 GB vs. 8 GB).
I got a Titan X on Amazon about 2.5 weeks ago, so have about 10 days to return it
for a full refund and try for a GTX 1080 or 1070. Is there any reason not to do this?
Reply
Reply
Spuddler says
2016-06-11 at 17:49
Just wanted to add that Nvidia artificially crippled the 16bit operation on
the 1070/1080 GTX to abysmal speeds, so we can only hope they don’t do
the same with the Pascal Titan card.
Reply
Jerry says
2016-05-08 at 05:20
Hi Tim. Thanks for an excellent guide! I was wondering what your opinion is on
Nvidia’s new graphics card – Nvidia Geforce GTX 1080. The performance is said to
beat the Titan X and is proposed to be half the price!
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 263/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Gilbert says
2016-05-07 at 15:50
Hi, does the number of CUDA core matter? GTX 1080 will be released already and it
has 2500 CUDA cores whereas a GTX 980 TI has about 2800 CUDA cores. Will this
affect the speed of training? Or In general GTX 1080 will be faster with is 8 teraflops
of performance?
Reply
The number of cores does not matter really. It all depends how these cores are
integrated with the GPU. The GTX 1080 will be much faster than the GTX Titan
X, but it is hard to say by how much.
Reply
Gilbert says
2016-05-09 at 19:07
Reply
So reading this post that bandwidth is the key limiter makes me think the gtx 1080
with a bandwidth of 320 will be slightly worse for deep learning than a 980 to. Does
that sound right?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 264/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
You cannot compare the bandwidth of a GTX 980 with the bandwidth of a GTX
1080 because the two cards use different chipsets. The GTX 1080 will definitely
be faster.
Reply
DD Sharma says
2016-05-05 at 01:39
Tim,
Reply
Skylake is not need and Quadro cards are too expensive — so no changes to
any of my recommendations.
Reply
Lucian says
2016-04-30 at 01:19
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 265/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Dorje says
2016-04-24 at 15:29
Cheers,
Dorje
Reply
Eduardo says
2016-04-24 at 10:02
Hi, I am a Brazilian student, so everything is way too expensive for me. I will buy a
gtx 960 and start of with a single GPU and expand later on. The problem is that
intel CPUs with 30+ lanes are WAY too expensive. So I HAVE to go with AMD, but
the motherboards for AMD only have PCIe 2.0.
My question is: can I get a good performance out of 2 x 960 GPUs on a PCIe 2 .0
x16 mobo? By good I mean equal to a single 960 with x16 on a PCIe 3.0, maybe
even a single gtx 980.
Reply
Hi, both a Intel CPU with 16 lanes or less (as long as your motherboard
supports 2 GPUs) as well as AMD with PCIe 2.0 will be fine. You will not see
large decreases in performance. It should be about 0-10% depending on task
and deep learning software.
If you are short on money it might also be an option to use AWS GPU
instances. If you do not train every day this might be cheaper in the end.
However, for tinkering around with deep learning a GTX 960 will be a pretty
solid option.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 266/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Raj says
2016-04-18 at 15:12
Cheers,
RK
Reply
Hi RK,
16 lanes should still work good with 2 GPUs (but make sure the CPU supports
x8/x8 lanes — I think every CPU does, but I never used them myself ). The
transfer to the GPU will be slower, but the computation on the should still be as
fast. You probably see a performance drop of 0-5% depending on the data that
you have.
Reply
RK says
2016-04-18 at 20:57
Reply
2016-04-19 at 19:24
You are welcome
Reply
Yi says
2016-04-14 at 07:52
Hi Tim,
Thanks for the great post. Sorry to bother you again. I just want to ask sth about
coolbits option of the GPU cards. Right now, I set it to 12 and I can manually control
the fan speed. It works nicely. But I won’t check the temperature all the time and
change the fan speed accordingly. So during training, how much percentage of fan
speed should I use? 50%, or 80% or an aggressive 90% maybe? Thanks a lot.
And if I keep the fan always running at 80% speed, will it reduce the lifecycle of the
card? Thanks.
Reply
The life expectancy of the card will increase the cooler you keep it. So if you
can you can keep the fan at 100% at all times. However, this of course can
problems with noise if the machine is nearby you or other people. For my
desktop I keep the fan as low as possible to keep the GPU below 80 degrees C
and if I leave the room I just set the fan speed to 100%.
Reply
Yi says
2016-04-25 at 06:25
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 268/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Spuddler says
2016-06-11 at 17:41
Keep in mind that running your fans at 100% constantly will wear out
the fans much faster – although that is better than a dead GPU chip. It
can be difficult to find cheap replacement fans for some GPUs, so you
should look for cheap ones on alibaba etc. and have a few spares lying
around in advance since shipping from china takes weeks.
Also, when a fan stops running smoothly, you can usually just buy
cheap “ball bearing oil” ($4 on ebay or so) and remove the sticker on
the front side of the fan. There will be some tiny holes beneath into
which you can simply squirt some of the oil and most likely the fan will
run as good as new. Worked out for me so far
Reply
Dorje says
2016-04-09 at 16:41
Hi Tim, THANKS for such a great post! and all these responses!
I got a question:
What if I buy a TX 1 instead of buying a computer ?
I will do video or CNN images classification sort things.
Cheers,
Dorje
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 269/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Dorje,
I also thought about buying a TX1 instead of a new laptop, but then I opted
against it. The overall performance on the TX1 is great for a small, mobile,
embedded device, but not so great compared to desktop GPUs or even laptop
GPUs. There might also be issues if you want to install new hardware because it
might not be supported by the Ubuntu for Tegra OS. I think in the end the
money is better spend to get a small, cheap laptop and buy some credit for
GPU instances on AWS. Soon there will also be high performance instances
(featuring the new Pascal P100), so this would also be a good choice for the
future.
Reply
My guess is that (if done right) the monitor functionality gets relegated to the
integrated graphics capability of the motherboard. Just don’t try to stream high-res.
video while training an algorithm.
Reply
Steven says
2016-04-09 at 03:06
Ooops – I should have mentioned that the motherboard I’m using is an ASRock
Fatal1ty X99 Professional/3.1 EATX LGA2011-3. It doesn’t have an integrated
graphics chip.
Reply
Steven says
2016-04-08 at 05:23
Hi Tim,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 270/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
This post was amazingly useful for me. I’ve never built a machine before and this
feels very much like jumping in the deep end. There are two things I’m still
wondering about:
1. If I’m using my GPU(s) for deep learning, can I still run my monitor off of them? If
not should I get some (relatively) cheap graphics card to run the monitor, or do
something else?
2. Do you have any opinion about Intel’s i7-4820K CPU vs. the i7-5820K CPU?
There seems to be a speed vs. cache size & cores trade-off here. My impression is
that whatever difference there is will be small, but the larger cache size should lead
to fewer cache misses, which should be better. Is this accurate?
Thanks
Reply
Steven says
2016-04-09 at 15:46
Was just reading through the Q/A’s here and saw your response to Rohit
Mundra (2015-12-22) answered my first question.
Reply
No problem, I am glad you made the effort to find the answer in the
comment section. Thanks!
Reply
Matt says
2016-04-05 at 04:49
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 271/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Everyone seems to be using an Intel CPU, but they seem prohibitively expensive if
actual clock speed or cache isn’t that important… Would an AMD cpu with 38 lane
support work just as well paired with two GPUs?
Also, have you experimented with builds using two different GPUs?
Reply
Yes, a AMD CPU should work just as well on 2 GPUs as an intel one. However,
using two different GPUs will not work if the have different chipsets (GTX 980 +
GTX 970 will not work); what will work if you have different vendors (EVGA GTX
980 + ASUS GTX 980 will work with no problems).
Reply
Matt says
2016-04-05 at 20:48
I see – thanks! I’m considering just getting a cheaper gpu to at least get my
build started and running and then picking up a Pascal gpu later. My plan
was to use the cheaper gpu to drive a few monitors and use the Pascal
card for deep learning. That kind of setup should be fine right? In other
words, there is only an issue with two different cards if I try to use them
both in training, but I’m essentially using just a single gpu for it
Reply
Hi,
Thanks for this post. Are there any Cloud solutions yet?
I used Amazon g2.2xlarge as well as g2.8xlarge as Spot Instances,
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 272/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
however, the GPUs are old, don’t support the latest CUDA features and spot prices
have increased.
Reply
There are also some smaller providers for GPUs but their prices are usually a bit
higher. Newer GPUs will also available via Microsoft Azure N-series sometime
soon, and these instances will provide access to high-end GPUs (M60 and K80).
I will look into this issue in the next week when I will update my GPU blog post.
Reply
Thanks!
Reply
Xiao says
2016-04-04 at 03:48
Hi Tim,
Thanks for the post! Very helpful. Was just wondering what editor (monitor in the
center) did you use in the picture showing the three monitors?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 273/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
That is an AOC E2795VH. Unfortunately they are not sold anymore. But I think
any monitor with a good rating will do.
Reply
Razvan says
2016-03-31 at 18:21
Hey Tim,
Awesome article. Was curious whether you have an opinion on the Tesla M40 as
well.
Cheers,
–Razvan
Reply
This post is getting slowly outdated and I did not review the M40 yet — I will
update this post next week when Pascal is released.
To answer your question, the Titan X is still a bit faster with 336 GB/s while the
M40 sports 288 GB/s. But the M40 has much more memory which is nice. But
both cards will be quite slow compared to the upcoming Pascal.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 274/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2016-04-04 at 03:26
Wow, I am super glad I read this response. Based on your comment about
the Pascal vs. the Titan X, I was able to place the development of my
system on hold, just in time! I was going to get a Titan X. But now I will
want to know if it will be much better to get the Pascal with 32 GB of
dedicated RAM (VRAM?) vs. the 12 GB of the Titan
X. http://www.pcworld.com/article/2898175/nvidias-next-gen-pascal-gpu-
will-offer-10x-the-performance-of-titan-x-8-way-sli.html
Do you have specific information that suggests it will be one week yet
before the Pascals will be available? How much do you thing the 1080 will
be (in USD, Euros, etc.)?
Reply
The Pascal P100 won’t even be available to most of us until later this year at
the soonest (http://wccftech.com/nvidia-pascal-gpu-gtc-2016/) and it isn’t
even in the same league as the Titan X. They haven’t said anything about
the 10xx’s, so I’m assuming they will be quite a while yet also?
Reply
Thanks for the great answers. Do you think that one Titan 12 GM of memory is
better than, say, two GTX 980s, or two of the upcoming Pascals (xx80s)? I currently
have a system designed that has a motherboard such that has the additional PCIe
lanes but that (as I’ve been told by the Puget Systems people) adding a second
GPU would slow down things by 2x. So I thought “just get the Titan w/ 12 GB of
memory and be done with it.” Do you think that sounds ok? Or do I upgrade the
motherboard? I’m thinking that the Titan may be more than I ever need, but
unfortunately I do not know. Thank you for your great help and thorough work.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 275/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Yi Zhu says
2016-03-26 at 08:34
Hi Tim,
Thanks for the great post. I am a graduate student, and would like to put together
a machine recently. But if I put up a system with i7-5930K CPU, Asus X-99 deluxe
MOBO and two titan x GPUs for now, will the pascal GPUs compatible with this
configuration? Can I just simply plug in a Pascal GPU when it is released? Thanks a
lot.
Reply
Reply
Hehe says
2016-03-20 at 12:44
Reply
The 60GB refers to the CPU memory that the AWS g2.8x has. The GPU
memory is 4GB per card.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 276/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Chip says
2016-03-20 at 06:41
Reply
Certain Haswells do not support the full 40 PCIe lanes. So if you buy a Haswell
make sure it support it if you want to run with multiple GPUs.
Reply
Phong says
2016-03-17 at 23:55
You say GTX 680 is appropriate for convnets, however I see GTX 680 just has 2GB
RAM which is inadequate for most convnets such as AlexNet and of course VGG
variants.
Reply
There is also a 4GB GTX 680 variant which is quite okay. Of course a GTX 980
with 6GB would be better, but it is also way more expensive. However, I would
recommend one GTX 980 over multiple GTX 680. It is just not worth the trouble
to parallelize on these rather slow cards.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 277/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Chip says
2016-03-16 at 11:05
Hi Tim,
Thanks for this excellent primer. I am trying to get a part set and have this so far
(http://pcpartpicker.com/p/JnC8WZ) but it has some 2 incompatibility issues.
Basically, I want to be working through this 2nd Data Science Bowl
(https://www.kaggle.com/c/second-annual-data-science-bowl) as an exercise. I will
likely work with a lot of medical image data. Also, I will use this system as an all-
purpose computer too (for medical writing), so I’m wondering if I also need to add
the USB, HDMI, and DVI connects (I currently also use an Eizo ColorEdge CG222W
monitor). Also, I like the idea of 2 hard drives, one for Windows and one for
Linux/Ubuntu (or I could partition?) Finally, I use a wireless connect, hence that
choice. I would be most grateful if you could help with the 2 incompatibilities, any
omissions, and seeing if this system would generally be ok. Thank you in advance
for your time.
Reply
Reply
Chip says
2016-03-20 at 05:15
Thank you for this response. I had the GTX 980 selected (in the
pcpartpicker permalink), but I may well just wait for the Pascal that you
suggested. I read this article (http://techfrag.com/2016/03/18/nvidia-
pascal-geforce-x80-x80ti-gp104-gpu-supports-only-gddr5-memory/),
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 278/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
however, and suppose I must admit I’m quite confused with the names, the
relationship of “Pascal” to GeForce X80, X80Ti & Titan Specs, and also the
concern with respect to GDDR5 vs. GDDR5X memory. Is it worth it to wait
for one of the GeForce (which I assume is the same as Pascal?) rather than
just moving forward with the GTX 980? Will one save money by way of
sacrificing something with respect to memory? Please forgive my neophyte
nature with respect to systems.
Reply
Pascal will be the new chip from NVIDIA which will be released in a few
months. It should be designated as GTX 10xx. The xx80 refers to the
most powerful GPU consumer model of a given series, e.g. the GTX
980 is the most powerful the 900s series. The GTX Titan is usually the
model for professionals (deep learning, computer graphics for industry
and so forth).
And yes I would wait for Pascal rather than buy a GTX 980. You could
buy a cheap small card and sell it once Pascal hits the market.
Reply
Wajahat says
2016-03-07 at 13:57
Hi Tim
Thanks a lot for your article. It answered some of my questions. I am actually new
to deep learning and know almost nothing of GPUs. But I have realized that I need
one. Can you comment on the expected speedup if I use ConvNets on a Titan X
than a n intel corei7 4770-3.4 Ghz?
Even a vague figure would do the job.
Best Regards
Wajahat
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 279/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
It depends highly on the kind of convnet you are want to train, but a speedup
of 5-15x is reasonable. However, if you can wait a bit I recommend you to wait
for Pascal cards which should hit the market in two months or so.
Reply
viper65 says
2016-02-23 at 18:15
Thank u. But consider the size of the memory and the brand, I am afraid the price
of pascal would far beyond my budget?
Reply
viper65 says
2016-02-22 at 22:20
Nice article!
What do you think about HBM? Apart from the size of ram, do you think that fury x
has any advantage comparing to 980Ti?
Reply
The Fury X definitely has the edge over the GTX 980 Ti in terms of hardware,
though in terms of software the AMD still lags behind. This will change quite
dramatically once NVIDIA Pascal hits the marked in a few months. HBM is
definitely the way to go to get better performance. However, the HBM of
NVIDIA offers double the memory bandwidth from the Furx X and Pascal will
also allows for 16-bit computations which effectively doubles the performance
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 280/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
further. So I would not recommend getting a Fury X, but instead to wait for
Pascal.
Reply
Bobby says
2016-02-23 at 21:37
How soon do you think will flagship of Pascal, like Titan X, be on the
market? I am not sure if I should wait. Thank you.
Reply
hroent says
2016-08-12 at 02:34
Hi Tim — Thanks for this article, I’ve found it extremely useful (as have
others, clearly).
You’re probably aware of this, but the new Titan X Pascal cards have very
weak FP16 performance.
Reply
Yes the FP16 performance is disappointing. I was hoping for more, but I
guess we have to wait until Volta is released next year.
Reply
Freddy says
2016-02-08 at 14:15
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 281/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hey Tim,
first of all thank you very much for your great article. It helped me alot to gain
some inside in the hardware requirements needed for any DL machine. Over the
past several years i only worked with laptops (in freetime) as i had some good
machines at work. Now i am planning to set up some system at home to start
experimenting on some stuff in my free time. After i read your post and many of
the comments i started to create a build (http://de.pcpartpicker.com/p/gdNRQ7)
and as you looked over so many systems and gave advices i hoped that you can
maybe do it once again
I choosed the 970 as a starter, and then wait for the pascal cards comming out later
this year. I am also not planning to work with more than 2 gpus in the future at
home. And for the monitor. i already have one 24″ at home, so this will just be the
2nd.
I dunno, maybe you can look over it and give me some advices or your opionon.
Reply
Looks like a solid build for a GTX 970 and also after an upgrade to one or two
Pascals this is looking very good.
Reply
Freddy says
2016-02-09 at 15:48
Thanks for the time you are spending, giving so many people advices. It
is/was quite hard for me after so many years of laptop use to dive back
into hardware specifics. You made it a lot easier with your post. Big thanks
again!
Reply
Lawrence says
2016-02-06 at 22:36
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 282/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim,
Great website ! I am building a Devbox, https://developer.nvidia.com/devbox.
My machine has 4 Titan X cards, Kingston Digital HyperX Predator 480 GB PCIe
Gen2 x 4 , Intel Core i7-5930K Haswell-E, and G.SKILL 64GB. I am using ASUS
RAMPAGE V extreme motherboard. When I place the last Titan X card on the last
slot, my SSD gets disapered from bios. I am not sure I have a PCIe conflict ? Does
M.2 can interfere with PCIE_X8_4. What should I do to fix this issue ? Should I
change the motherboard, any advice ?
Reply
Reply
Bobby says
2016-02-19 at 07:07
Hi Tim,
Reply
2016-02-19 at 16:00
Hi Bobby,
I have no experience with RAID 5, since usual datasets will not benefit
from increased read speeds as long as you have a SSD. I think you will
need to change some things in BIOS and then setup a few things for
your operating system with a raid manager. I think you will be able to
find a tutorial for your OS online so you can get it running.
Reply
Bobby says
2016-02-21 at 01:12
Hi Tim,
It seems it’s not related to the RAID. I wonder how to setup an SSD
as the cache for a normal HDD. Setting it as the cache for RAID
should be similar. With this, I may not need to manually copy my
dataset for HDD to SSD before experiment. Thank you.
Hi Tim:
Thanks so much for sharing your knowledge!
I’ve seen you mentioned that Ubuntu is a good OS..
what is the best OS for deep learning?
What is a good alternative to Ubuntu?
I’d really appreciate your thoughts on this…
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 284/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2016-01-25 at 14:08
Linux based system are currently best for deep learning since all major deep
learning software frameworks support linux. Another advantage is, that you will
be able to compile almost anything without any problems while on other
systems (Mac OS, Windows) there will always be some problems or it may be
nearly impossible to configure a system well.
Ubuntu is good, because it is widely used, easy to install and configure, and it
has some support for their LTS versions which makes it attractive for software
developers which target linux systems. If you do not like Ubuntu you can use
Kubuntu, or other X-buntu variants; if you like a clean slate and to configure
everything they way you like I recommend Arch Linux, but be beware that it will
take a while until you configured everything the way it is suitable for you.
Reply
JB says
2016-01-10 at 22:01
Tim,
First of all, thank you for writing this! This post has been extremely helpful to me.
I’m thinking about getting a gtx 970 now and upgrading to pascal when it comes
out. So, if I never use more than 3.5gb vram at a time, then I won’t see
performance hits, correct? I’m building my rig for deep reinforcement learning
(mostly atari right now), so my minibatches are small (<2MB), and so are my
convnets (<2mill weights). Should I be fine until pascal?
I'm trying to decide between these two budget builds: [Intel Xeon e5]
(http://pcpartpicker.com/p/dXbXjX) and [Intel i5]
(http://pcpartpicker.com/p/ktnHdC). I'm thinking about going with the Xeon, since
it has all 40 pcie lanes if I wanted to do more than two gpus in the future, and it's a
beefier processor. However, I start grad school in the fall, so I'd have university
hardware then, and think I'd be more than fine with two gpus for personal
experiments in the future. (Or could 4 lanes be enough bandwidth for a gpu?) If I
get the i5 I could upgrade the processor without having to upgrade the
motherboard if I wanted. The processor just needs to be good enough to run
(atari) emulations and preprocess images right now. I can't really imagine anything
but the GPU being the bottleneck, right?
Thank you for the help. I'm trying to figure out something that will last me awhile,
and I'm not very familiar with hardware yet.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 285/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Thanks again,
– JB
Reply
Hi JB,
the GTX 970 will perform normally if you stay below 3.5GB of memory. Since
your mini-batches are small and you seems to have rather few weights this
should fit quite well into that memory. So in your case the GTX 970 should give
you optimal cost/performance.
Reply
Hey Tim,
Thanks for the great article; I have a more specific question though – I’m building
an entry-level Kaggle-worthy system using an i7-5820K processor. Since I want to
keep my GTX 960’s 4GB memory solely for deep learning, would you recommend I
buy an additional (cheaper) graphic card for display or not? I’m considering the GT
610 for this purpose since it’s cheap enough. Also, if I were to do this, where would I
specify such a setting (e.g. use GT 610 for display)?
Thanks again!
Rohit
Reply
For most datasets on Kaggle your GPU memory should be okay and using
another small GPU for you monitors will not do much. However, if you are
doing one of the deep learning competitions and you find yourself short on
memory and you think you could improve your score by using a model that is
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 286/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
a bit larger then this might be worth it. So I would only consider this option if
you really encounter problems where you are short on memory.
Fusiller says
2015-11-29 at 18:16
Just a quick note to say thank you and congrats for this great article.
Very nice of you to share your experience on the matter.
Regards.
Alex
Reply
Reply
Eystein says
2015-11-19 at 19:26
Hello! First off, I just want to say this website is a great initiative!
I’m going to use Kaldi for speech recognition the next spring in my master thesis.
Not knowing exactly what type of DNNs I’ll be implementing, I’m planning for an
allround solid, budget GPU. Is the GTX 950 with 2 GB suitable (I haven’t seen this
mentioned here)? It only requires a 350 W PSU, which is why I’m considering it. Also
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 287/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I have a Q6600 CPU and a motherboard that has 4 GB RAM as a max, so this is a
bit constraining on the overall performance of this setup. And apologies if this is
too general a question. I’m just now getting into the field
Reply
The GTX 950 2GB variant might be a bit short on RAM for speech recognition if
you use more powerful models like LSTMs. The cheapest solution might be to
prototype on your CPU and use AWS GPU instances to run the model if
everything looks good. This way you need no new computer/PSU and will be
able to run large LSTMs and other models. If this does not suit you, a GTX 950
with 4GB of memory might be a good choice.
Reply
Eric says
2015-10-26 at 08:29
Tim,
Thank you for the many detailed posts. I am going with a one GPU Titan X water
cooled solution based on information here. Does it still hold true that adding a
second GPU will allow me to run a second algorithm but that it will not increase
performance if only one algorithm is running? Best Regards – Eric
Reply
There are now many good libraries which provide good speedups for multiple
GPUs. Torch7 is probably the best of them. Look for the Torch7 Facebook
extensions and you should be set.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 288/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
BK says
2015-10-21 at 16:50
Hi Tim,
Great post; In general all of the content on your blog has been fantastic.
I’m a little curious about your thoughts on other types of hardware for use in deep
learning. I’ve heard a number of people suggest FPGAs to be potentially useful for
deep learning(and parallel processing in general) due to their memory efficiency vs.
GPUs. This is often mentioned in the context of Xeon Phi….what are your thoughts
on this? If true, where does the usefulness lie, in the ‘tracking’ or ‘scoring’ part of
deep learning(my perhaps incorrect understanding was GPUs advantage was their
use for training as opposed to scoring)?
My apologies for what I’m certain are sophomoric questions; I’m trying to wrap my
head around these matters as someone new to the subject!
Regards,
BK
Reply
FPGAs could be quite useful for embedded devices, but I do not believe they
will replace GPUs. This is because (1) their individual performance is still worse
than an individual GPU and (2) combining them into sets of multiple FPGAs
yields poor performance while GPUs provide very efficient interfaces (especially
with NVLink which will be available at the end of 2016). GPUs will make a very
big jump in 2016 (3D memory) and I do not think FPGAs will ever catch up
from there.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 289/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
BK says
2015-10-28 at 15:29
One last question, beyond the world of deep learning, what is the
perception of xeon phi? It seems hard to find people who are talking with
certainty as to what its strengths/applications will be. Is there any consensu
on this? what do you think makes most sense for xeon phi as an
application?
Many thanks!
-BK
Reply
Greg says
2015-10-20 at 00:45
Hey Tim…
Do you have any suggestions for a tutorial for DL using Torch7 and Theano and/or
Keras?
Thanks
Greg
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 290/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim,
Thank you very much for all the writting. I am an objective C developer but a brand
newbie to the deep learning thing and so interested in this area right now.
I got a Mac 3.1 and I would like to upgrade the graphic card for having CUDA to
run torch7, lua and nn as to learn about this programming. Don’t bother if this
should be a Mac card or Windows card.
Which one should you recommend? GTX 780Ti?GTX 960 2GB? GTX 980? Tesla
M2090(second hand)?
Look forward to your advice.
Reply
From the cards you listed the GTX 980 will be the best by far. Please also have a
look at my GPU guide for more info how to choose your GPU.
Reply
Thank you very much. I got a generous sponsor to build up a new ubuntu
machine with 2 GTX 780 Ti. Should I use the GTX 980 in the new machine
to yield better performance than a SLI GTX 780 Ti or let it stay in my Mac?
Reply
If you already have the two GTX 780 Ti I would stick with that and only
change/add the GPU if you experience RAM shortage for one of your
models.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 291/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi Tim, The company that I buy my servers from (Thinkmate) recently sent me an
e-mail advertising that they’ve been working with Supermicro to sell servers with
support for Titan X. What do you think about this solution? I’ve had a lot of luck
with Supermicro servers, and they offer 3 year warranty on the Titans and will
match the price if found cheaper elsewhere. Here’s the
link: http://www.thinkmate.com/systems/servers/gpx/gtx-titan-x
Reply
Hi Brent, I think in terms of the price, you could definitely do better on the 1U
model with 4 GTX Titan X. A normal board with 1 CPU will not have any
disadvantage compared to the 1U model for deep learning.
However, the 4U model is different because it can use 8 GTX Titan X with a fast
CPU-to-CPU switch which makes parallelization of 8 GPUs easy and fast. There
are only few solutions available that are build like this and come with 8 GTX
Titan X — so while the price is high, this will be a rather unique and good
solution.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 292/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Greg says
2015-10-08 at 05:13
Lastly, I kept testing and found the culprit….when installing Cuda I can’t install the
502 driver that it comes with or the Ubuntu system locks with an unknown
password…no matter trying a ton if different ways to install the Cuda driver. I
scoured the internet for a solution and there wasn’t one and it looks like no one has
put 2 n 2 together about the Cuda driver. It could be a combo of things both
hardware and software but it definitely involves this driver the x99 mb, a titian x and
Ubuntu 14.04 and 15.04.
Thanks.
Reply
Greg says
2015-10-06 at 21:08
Hi Tim..
Recently I have had a ton of trouble working with Ubuntu 14.04 …installing Cuda,
caffe etc. Ubuntu has password locked me out of my system twice and getting all
dependencies installed to make caffe to install has been a real problem. It works
sometimes …other times it doesn’t work. Ubuntu 14.04 is clearly an unstable OS.
I would like your opinion TIm on moving from Linux to Windows for deep learning?
What are your thoughts?
Thanks in advance…
-Greg
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 293/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I can feel your pain — I have been there too! Ubuntu 14.04 is certainly not
intuitive when you are switching from Windows and a simple unseemingly
command can ruin your installation. However, I found once you understand
how everything is connected in Linux things get easier, make sense, and you no
longer will run into errors which break your installations or even our OS. After
this point, programming in Linux will be much more comfortable than in
Windows due to the ease to compile and install any library. So it may be painful
but it is well worth it. You will gain a lot of if you go through the pain-mile —
keep it up!
Reply
Greg says
2015-10-07 at 18:43
After Ubuntu 14.04 locking me out 3 times via a booting up and false
logon screen… I thought I’d try Ubuntu 15.04. I think the Cuda driver
slammed Unity resetting the root password to something other than the
password I gave it. I search the web and this is a common problem and
there seems to be no fix.
I’m running x99 MB, I7 5930, 64 GB ram, and one Titan x. I’ll get a second
Titan x when I’m ready for it. I want to create my own NN and nodes but
for now I have a ton of learning to do and I need to follow what’s been
done so far.
Do you use standard libraries and algorithms like Caffe, Torch 7 and
Theano via Python? I feel I need to wade through everything to see how it
works before using it. Nvidia Digits looks pretty simple working from the
GUI but it also looks, from my limited experience, like it’s pretty limited.
Reply
Is this because of your x99 board? I never had any problems like that.
As for the software, Torch7 and Theano (Keras and derivatives) works
just fine for me. I have tried Caffe once and it worked, but I also heard
some nightmare stories about installing Caffe correctly. NVIDIA Digits
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 294/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
will be just as you described: Simple and fast, but if you want to do
something more complex it will just be an expensive fast PC with 4
GTX Titan X.
Reply
mxia.mit@gmail.com says
2015-10-07 at 20:36
Just to tag onto this, I have an X-99 E board, and had some problems on the
initial install when trying to boot into ubuntu’s live installer, nothing with the
password though. After installing everything worked fine at the OS level. In
case this is relevant, reflashing to the latest BIOS helped a lot, but probably
won’t help your password problem.
Mike
Reply
Safi says
2015-10-04 at 22:54
Hi Tim,
First thanks a lot for these interesting and useful topics. I am a PhD student i work
on Evolutionary ANNs.
I want to start using GPUs, my budget can reach 150$ Max.
I found in my town a new GTX 750 and GTX 650 Ti. Which one is better and are
they supported by cuDNN.
Thank
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 295/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
A GTX 750 should be better, and both support cuDNN. However, I would also
suggest that you have a look at AWS GPU instances. The instance will be a bit
faster and may suit your budget well.
Reply
ML says
2015-09-28 at 17:02
Hello Tim, what about external graphic cards connected through Thunderbolt?
Have you looked at those? Could that be a cheap solution without having to
build/buy a new system?
Reply
Reply
Tony says
2015-09-25 at 18:29
One concern that I have is that I also use triple monitors for my work setup.
However, doesn’t the fact that you’re using triple monitors effect performance of
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 296/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
your GPU? Do you recommend buying a cheap $50 gpu for your triple monitor
setup and then dedicating your titan x or your more expensive primarily to deep
learning? I run Recurrent Neural Nets,
Thanks!
Reply
Three monitors will use up some additional memory (300-600MB) but should
not affect your performance greatly (< 2% performance loss). I recommend
getting a cheap GPU for your monitors only if you are short on memory.
Reply
Tony says
2015-09-28 at 19:00
Thanks — that makes alot of sense. I just thought it would affect your
bandwidth (as that is usually the bottleneck). I’m currently running the 980
TI — I know it has 336Gb/s. Good to know that it uses some memory
though. Appreciate it.
Reply
Hello Tim,
Thank you for your article. The deep learning devbox (NVIDIA) has been touted as
cutting edge for researchers in this area. Given your dual experience in both the
hardware and algorithm sides, I would be grateful to hear your general thoughts on
the devbox. I know it came out a few months after you wrote your article.
Thank you!
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 297/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
I just want to thank you again Tim for the wonderful guide. I do have a couple of
hardware utilization questions though. I am trying to figure out how to properly
partition my space in ubuntu to handle my requirements. I dual boot Windows 10
(for work/school) and Ubuntu 14.04.3 (deep learning) with each having their own
SSD boot drive and HDD storage drive. For starters here’s my setup:
My windows install is fine, but I want to be able to store currently unused data in
the HDD, stage batches in the SSD then send the batches from SSD to RAM to fully
leverage the IOPS gain in a SSD.
I currently have Ubuntu partitioned this way, however I’m not entirely sure this will
fit my needs. I’m thinking I might want to allocate /home on the HDD due to how
ubuntu handles the /home directory in the UI, but I’m unsure if that will be a
problem with deep learning:
SSD (boot):
– swap area – 16GB
– / – 20GB
– /home – 20GB
– /var – 10GB
– /boot – 512MB
– /tmp – 10GB
– /var/log – 10GB
HDD
– /store 1TB
Reply
vinay says
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 298/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2015-09-08 at 15:25
Does anyone know what would be the requirements for prediction clusters? Most
articles focus on training aspects but inference/prediction is also important and
compute demand for these are little discussed. Can anyone comment on compute
demands for prediction? Also, what do you recommend, CPU only, CPU+GPU, or
CPU+FPGA, etc for such tasks?
Thanks,
Vinay
Reply
Prediction is much faster than training, but still a forward pass of about 100
large images (or similar large input data) takes about 100 milliseconds on a
GPU. A CPU could do that in a second or two.
If you predict one data point at a time a CPU will probably be faster than a
GPU (convolution implementations relying on matrix multiplication are slow if
the batch sizes are too small), so GPU processing is good if you need high
throughput in busy environments, and a CPU for single predictions (1 image
should take only 100 milliseconds for a good CPU implementations). Multiple
CPU servers might also be an option, and usually they are easier to maintain
and cheaper (AWS spot instances for example, also useful for GPU work). Keep
in mind that all these these numbers are reasonable estimates only and will
differ from the real results; results from a testing environment that simulates the
real environment will make it clear if CPU servers or GPU servers will be
optimal.
I do not recommend FPGA for such tasks since over time, interfaces to FPGA
are not easy to maintain and cloud solutions do not exist (as far as I know).
Reply
Sascha says
2015-09-05 at 16:44
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 299/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi,
thanks a lot for all this information. After stumbling across a paper from Andrew Ng
et al (“Deep learning with COTS HPC systems”) my original plan was to also build a
cluster (to learn how it is done). I wanted to go for two machines with a bunch of
GTX Titans but after reading your blog I settled with only one pc with two GTX 980s
for the time being. My first thought after reading your blog was to actually settle for
two 960s but then I thought about the energy consumption you mentioned.
Looking at the specifications of the nvidia cards I figured the 980 were the most
efficient choice currently (at least as long as you have to pay German energy
prices).
As I am still relatively fresh to machine learning I guess this setup will keep me busy
enough for the next couple of months, probably until the pascal architecture you
mentioned is available (I read somewhere 2nd half of 2016). If not then I guess I will
buy another PC and move one of the 980s into it so that I can learn how to setup a
cluster (my current goal is learning as much as fast as possible).
CPU: Intel i7-5930k (I chose this one instead of the much cheaper 5920 as it has the
40 PCI lanes you mentioned, which gives the additional flexibility of handling 4
graphics cards)
Mainboard: ASRock Fatal1ty X99 Professional (supports up to 4 graphics cards and
has a M.2 slot)
RAM: 4×8 GB DDR4-3000
Graphics Card: 2x Zotac GTX 980 AMP! Edition
Hard Disk: Samsung SSD SM951 with 256 GB (thanks to M.2 it offers 2 GB/s of
sequential read performance)
Power Supply: be quiet! BN205 with 1200 Watts
I hope that installing Linux on the ssd works as I read that the previous version of
this ssd mad some problems.
Thanks again
Sascha
Reply
Hi Sascha! Your reasoning is solid and it seems to got a good plan for the
future. Your build is good, but as you say, the PCIe SSD could be a bit
problematic to set up. Another fact to be aware of is that your GPUs will have a
slower connections with that SSD, because the SSD takes away bandwidth from
your GPUs (your GPUs will run at 16x/8x instead of 16x/16x). Overall the PCIe
SSD would be much faster for common applications, but slower when you use
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 300/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
parallelism on two GPUs, it might be better to go for a SATA SSD (if you do not
use parallelism that much a PCIe SSD is a solid choice). A SATA SSD will be
slower than the PCIe one, but it should be still be faster enough for any deep
learning task. However, preprocessing will be slower on this SSD, and this is
probably the main advantage of the PCIe SSD.
Reply
Sascha says
2015-09-06 at 09:49
That is an interesting point you make regarding the M.2. I did not realise
that this is how the board will distribute the lanes. I figured that as the M.2
only uses 4 lanes the two cards could each run with 16 and if I actually
decided to scale up to a quad setup each card eventually would only get 8
lanes.
My first idea after reading the comment was to just try the ssd in the
additional M.2 PCI 2.0 slot, which is basically a SATA 6 connection but that
will not work, as it will not fit because one has the Key B and the other the
Key M layout.
Do you have an idea about what this actually means for real life
performance in deep learning tasks (like x% slower)?
Greetings
Sascha
Reply
When I think about it again, I might be wrong about what I just said.
How two GPUs and the PCIe SSD will work depends highly on your
motherboard and how the PCIe slots are wired and how the PCIe-
switches are distributed. I think with a 40 PCIe lane CPU and a
mainboard that supports 16x/16x/8x layout, it should be possible to
configure that to use 16 lanes for your GPUs and 8 lanes for your SSD;
to use that setup you only need to make sure to plug everything into
the right slot (your mainboard manual should state how to do this). I
have not looked at your hardware in detail, but I think your hardware
supports this.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 301/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
What are your opinions on RAID setups in a deep learning rig? Software-based
RAID is pretty crappy in my experience and can cause a lot more problems than it
solves. However, RAID controllers take a PCI-E slot which will,
fortunately/unfortunately, all be taken by 4 x Gigabyte GTX 980 TI cards. Is it worth
running RAID with the software controller? Or is it better just to do full copy clone
backups?
Reply
I do not think it is worth it. Usually, a common SATA SSD will be fast enough for
most kinds of data; in come cases there will be a decrease in performance
because the data takes too long to load, but compared to the effort and
money spend on a RAID system (hardware) it is just not worth it.
Reply
mxia.mit@gmail.com says
2015-09-01 at 21:18
Hey Tim,
Thank you so much for this great writeup, it’s been pivotal in helping me and my
co-founder understand the hardware. We’re a duo from MIT currently working on a
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 302/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
venture backed startup bringing deep learning to education, hoping to help at least
improve, if not fix, the US education system.
Our first build is aiming to be cheap where it can (since both of us are beginners
and we need to be frugal with our funding) but future proof enough for us to do
harder things.
Could you look over these and offer any critique? My logic was to have a Mobo
and CPU that could handle upgrading to better hardware later, things like the PSU,
Ram, and the 960 I’m willing to replace later on.
Thank you in advance! Also is there a way we could exchange emails and chat
more?
Would love any advice we can get from you while we build out our product.
Best,
Mike Xia
Reply
Looks good. The build is a bit more expensive due to the X99 board, but as
you said, that way it will be upgradeable in the future which will be useful to
ensure good speed of preprocessing the ever-growing datasets. You are
welcome to send me an email. My email is firstname.lastname@gmail.com
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 303/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I have been looking for an affordable CPU with 40 lanes without luck. Could you
give me with a link?
I am also curious about the actual performance benefit of 16x vs 8x. If the
bottleneck are the DMA writes will the performance reduce by halve?
Reply
Tori says
2015-08-18 at 00:38
How would GTX Titan Z compare to GTX Titan X for the purpose of training a large
CNN? Do you think it’s worth the money to buy a GTX Titan Z or is a GTX Titan X
good enough? Thanks!
Reply
A GTX Titan X will be much better in most cases. If you want more details have
a look at my answer about this equation on quora.
Reply
Peter says
2015-08-13 at 16:22
Hi Tim,
Firstly, thanks for this article; it’s extremely informative (in fact your entire blog
makes fascinating reading for me, since I’m very new to neural networks in
general).
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 304/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I want to get a more powerful GPU to replace my old Gtx 560 Ti (a great little card,
but 1gb of memory is really limiting and I presume it’s pretty slow these days too).
Sadly I cannot really afford the GTX Titan X (as much as I’d like to, 1300 CAD is too
damn high). The 980 Ti is also a bit on the high end, so I’m looking at the 980, since
it’s about 200 CAD cheaper. My question is; how much performance am I gaining
from my old 560 Ti to a 980/980 Ti/Titan X? Is the difference in gained speed even
that large? If it’s worth saving for the bigger card then I’ll just have to be patient.
I’m currently running torch7 and a LSTM-RNN with batches of text, not images, but
if I want to do image learning I assume I’d want as much RAM as possible?
Cheers
Reply
The speedup should be about 4x when you go from a GTX 560 Ti to a GTX
980. The 4GB ram on the GTX 980 might be a bit restrictive for convolutional
networks on large image datanets like ImageNet. A GTX Titan X or GTX 980 Ti
will only be 50% faster than a GTX 980. If you wait about 14-18 months you can
get a new Pascal card which should be at least 12x faster than your GTX 560 Ti.
I personally would value getting additional experience now as more important
than getting less experience now and faster training in the future — or in other
words, I would go for the GTX 980.
Reply
Peter says
2015-08-17 at 15:36
How exactly would I be restricted by the 4GB of ram? Would I simply not
be able to create a network with as many parameters, or would there be
other negative effects (compared to the 6GB of the 980 Ti)?
You’ve mentioned in the past that bandwidth is the most important aspect
of the cards, and the 980 Ti has 50% higher bandwidth than the regular
980; would that mean it’s 50% faster too, or are there other factors
involved?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 305/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
howtobeahacker says
2015-08-13 at 07:41
Hi Tim,
I have a minor question related to 6-pin and 8-pin power connector. It is related to
your sentence “One important part to be aware of is if the PCIe connectors of your
PSU are able to support a 8pin+6pin connector with one cable”.
My workstation has one 8-pin cable to TWO 6-pin cable connectors. Is it possible
that we plug into these two 6-pin connectors to power up Titan X which requires 6-
pin and 8-pin power connectors? I think I will try it, because I want to plug 2 GPUs
Titan X and only this way my workstation can support up two GPUs.
Thank you so much.
@An
Reply
I think this will depend somewhat on how the PSU is designed, but I think you
should be able to power two GTX Titan X with one double 6-pin cable, because
the design makes it seem that it was intended for just that. Why would they put
two 6-pin connectors on a cable if you cannot use them? I think you can find
better information if you look up your PSU and see if there is a documentation,
specification or something like that.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 306/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
howtobeahacker says
2015-08-12 at 05:09
Hi Tim,
Thank for your responses. I read your posts and I remembered an image of a
software in Ubuntu to visualize states of GPU. Something that is similar to Task
Manager for CPUs. If you have information, please let me know.
Reply
Hi Tim!
We’ve already asked you for some advice, an it was helpful… We put together a
dev-box in the meanwhile, with 4 Titans inside, it works perfectly.
Now we are considering production servers for image tasks. One of them would be
classification. Considering the differences between training and runtime (runtime
handles a single image, forward prop only), we were wondering if it would be more
cost effective to run multiple weaker GPUs, as opposed to fewer stronger ones….
We are reasoning that a request queue consisting of single-image tasks could be
processed faster on two separate cards, by two separate processes, then on a
single card that is twice as fast. What are your thoughts on this?
We’ve run very crude experiments, comparing classification speed of a single image
on a Titan machine, vs. 960M equipped laptops. The results were more or less as
we expected: Titans are faster, but only about 2x, whereas Titans are 4x more
expensive then a GTX960 (which has significantly more GFLOPS then the 960M). In
absoulte terms, classification speed on a weaker card is acceptable, we’re
wondering about behavior under heavy load.
Reply
Hi Florijan!
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 307/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I think in the end this is a numbers game. Try to overflow a GTX 960M and a
Titan with images and see how fast they go and compare that with how fast
you need to be. Additionally, it might make sense to run the runtime
application on CPUs (might be cheaper and more scalable to run them on AWS
or something) and only run the training on GPUs. I think a smart choice will
take this into account, and how scalable and usable the solution is. Some AWS
CPU spot instances might be a good solution until you see where your project
is headed to (that is if a CPU is fast enough for your application).
Reply
Tim,
Thanks for your reply. You’re right, it definitively is a numbers game, I guess
we will simply need to stress-test.
We already tried to run our classifier on the CPU, but classification time was
an order of magnitude slower then on the 960M, so that doesn’t seem a
good option, especially considering the price of a GTX960 card.
We’ll do a few more tests at some point. If we find out anything interesting,
I’ll post back here…
Reply
Roelof says
2015-08-09 at 20:42
Hi Tim,
Thanks a lot for your great hardware guide!
I’m planning to build a 3 x Titan X GPU setup, which will be more or less running on
a constant basis: would you say that water cooling will make a big impact on
performance (by keeping the temperatures always below the 80 degrees)?
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 308/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
As the machine will be installed remotely, where I don’t have easy access to it, I’m a
bit nervous about installing a water cooling system in such a setup, with the risk of
cooling leakage, so the “risk” has to be worth the performance gain
Do you have any experience with water cooled systems, and would you say that it
would be a useful addition ?
Also would you advise a nice tightly fit chassis, or a bigger one which allows better
airflow ?
Finally (so many questions :P), would you think 1500 watt with 92-94% efficiency at
100% load should suffice in the case I might use 4 Titan X GPUs, or would it be
better to go for a 1600W PSU ?
Reply
If your operate the computer remotely, another option is to flash the BIOS of
the GPU and crank up the fan to max speed. This will produce a lot of noise
and heat, but your GPUs should run slightly below 80 degrees, or at 80 degrees
with little performance lost.
Water cooling is of course much superior but if you have little experience with it
it might be better to just go with an air cooled setup. I have heard if installed
correctly, water cooling is very reliable, so maybe this would be an option when
somebody else, how is familiar with water cooling helps you to set it up.
In my experience, the chassis does not make such a big difference. It is all
about the GPU fans, and getting the heat out quickly (which is mostly towards
the back and not through the case). I installed extra fans for better airflow
within the case, but this only make a difference of 1-2 degrees. What might
help more are extra backplates and small attachable cooling pads for your
memory (both about 2-5 degrees).
I used a 1600W PSU with 4 GTX Titans which need just as much power as a GTX
Titan X and it worked fine. I guess 1500W would also work well and 92-94%
efficiency is really good. I would try with the 1500W one and if it does not work
just send it back.
Reply
Roelof says
2015-08-10 at 16:57
With a custom build water cooling system for both the Cpu and the 3 Titan
X’s, which I hope will let me crank up these babies while keeping the
temperature all times below the 80 degrees.
The machine is partly (at least the chassis is) inspired by Nvidia’s recently
released DevBox for Deep Learning (https://developer.nvidia.com/devbox),
but for almost 1/2 of the price. Will post some benchmarks with the newer
cuDNN v3 once its build and all setup.
Reply
Alex says
2015-11-12 at 01:15
How did your setup turn out ? I am also looking to either build a box
or find something else ready made (if it is appropriate and fits the bill).
I was thinking of scaling down the nvidia devbox as well. I also saw
these http://exxactcorp.com/index.php/solution/solu_detail/233 which
are similar. Very expensive.
Thanks for any insight and thanks Tim for the great blog posts!
Reply
Axel says
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 310/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2015-08-08 at 19:17
Hi Tim,
I’m a Caffe user, and since Caffe has recently added support for multiple GPUs, I
have been wondering if I should go with a Titan X or with 2 GTX 980. Which of this
2 configurations would you choose? I’m more inclined towards the 2 GTX 980, but
maybe there are some downsides with this configuration that I haven’t thought
about.
Thanks!
Reply
Reply
howtobeahacker says
2015-08-08 at 08:19
Reply
A very tiny space between GPUs is typical for non-tesla cards and your cards
should be safe. The only problem is, that your GPUs might run slower because
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 311/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
they reach their 80 degrees temperature limit earlier. If you run a unix system,
flashing a custom BIOS to your Titans will modify the fan regulation so that
your GPUs should be cool (< 80 degrees C) at all times. However, this may
increase the noise and heat inside the room where your system is located.
Flashing a BIOS for better fan regulation will most and foremost only increase
the lifetime of your GPUs, but overall everything should be fine and safe
without any modifications even if you operate your cards at maximum
temperature for some days without pause (I personally used the standard
settings for a few years and all my GPUs are still running well).
Reply
howtobeahacker says
2015-08-11 at 01:52
Hi Tim,
Thank for your responses. I read your posts and I remembered an image of
a software in Ubuntu to visualize states of GPU. Something that is similar to
Task Manager for CPUs. If you have information, please let me know.
Reply
howtobeahacker says
2015-08-31 at 09:01
Hi Tim,
I just found a way to increase GPU fan in Ubuntu using Nvidia X server
settings. The details are in http://askubuntu.com/questions/42494/how-
can-i-change-the-nvidia-gpu-fan-speed
Reply
Indeed, this will work very well if you have only one GPU. I did not
know that there was a application which automatically prepares the
xorg config to include the cooling settings — this is very helpful, thank
you! I will include that in an update in the future.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 312/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
An Tran says
2015-12-07 at 06:20
pedropgusmao says
2015-08-05 at 08:25
Hello Tim,
First of all thanks for always answering my questions and sorry for coming back
with more
Do you think a 980 (4GB) is enough for training current neural nets (alexnet,
overfeat, vgg), or would it be wise to go for a 980ti?
Thanks again.
Reply
2015-08-05 at 08:39
4 GB of memory can indeed be quite short sometimes. If time is cheaper than
money, go for a GTX 980Ti, or even better a GTX Titan X.
Reply
gac says
2015-08-05 at 04:19
Hi Tim,
First of all, excellent blog! I’m putting together a gpu workstation for my research
activities and have learned a lot from the information you’ve provided so ….
thanks!!
I have a pretty basic question. So basic I almost feel stupid asking it but here goes
…
Given your deep learning setup which has 3x GeForce Titan X for computational
tasks, what are your monitors plugged in to?
I would like a very similar setup to yours (except I’ll have two 29″ monitors) and I
was wondering if it’s possible to plug these into the Titan cards and have them
render the display AND run calculations.
Or is it better to just have another, much cheaper, graphics cards which is just for
display purposes?
Reply
I have my monitors plugged into a single GTX Titan X and I experience no side
effects from that other than a couple of hundreds MB memory that is needed
for the monitors; the performance for CUDA compute should be almost the
same (probably something like 99.5%). So no worries here, just plug them in
where it works for you (on windows, one monitor would also be an option I
think).
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 314/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Vu Pham says
2015-08-04 at 16:14
I’m so sorry the X3 version of Mellanox does not support RDMA but the X4 does
Reply
Vu Pham says
2015-08-04 at 16:07
So, I do a research on deep learning hardware, I assume the most appropriate Part
list is:
Motherboard: X10DRG-Q – This is an dual socket board which alow you to double
the lane of the cpu. It has 4x fully functional x16 PCI Ex 3.0 Slot and an extra 4 x PCI
Ex 2.0 Slot for a Mellanox card.
CPU: 2X E5-2623
Assume the other parts are $1000, total cost would be $7,585, half the price of the
Nvidia Dev box. My god NVIDIA.
Reply
This sounds like a very good system. I was not aware of the X10DRG-Q
motherboard; usually such mainboards are not available for private customers
— this is a great board!
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 315/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I do not know the exact topology of the system compared to the Nvidia Dev
box, but if you have two CPUs this means you will have an additional switch
between the two PCIe networks and this will be a bottleneck where you have to
transfers GPU memory through CPU buffers. This makes algorithms
complicated and prone to human error, because you need to be careful how to
pass data around in your system, that is, you need to take into account the
whole PCIe topology (on which network and switch the infiniband card sits etc.,
on which network the GPU sits etc.). Cuda convnet2 has some 8 GPU code for
a similar topology, but I do not think it will work out of the box.
If you can live with more complicated algorithms, then this will be a fine system
for a GPU cluster.
Reply
Vu Pham says
2015-08-05 at 10:05
I got it, so stick to the old plan then, Thank you any way.
Reply
Vu Pham says
2015-08-08 at 15:18
Hi Tim
Fortunately, Supermicro provides me X10DRG-Q mobo diagram, and it
would be also a gerneral diagram for other 2011 dual socket mobo which
has 4 or morethan 4 PCIEX slot. 2 CPU are connected by 2 QPI – Intel
QuickPath Interconnect. If cpu1 has 40 lanes, then 32 lane for 2 PCI ex 16,
4x for 10Gigabit Lan, 4x for a 4x PCI ex (8x slot shape, which will be cover if
you install 3rd graphic card). The 2nd cpu also provide 32 lane for pci
express, then 8x will be 8x slot on the top slot (nearest cpu socket). Pretty
complicated.
The point when I build a perfect 4×16 PCIex3.0 is that I though the
performence gonna be half if the bandwidth go from 16x down to 8x. Do
you have any infomation how much performnce different, said a single
titan x, on a 16x 3.0 and 16x 2.0?
Reply
If you want a less complicated system that is still faster, you can think
about getting a cheap InfiniBand FDR card on eBay. That way you
would buy 6 cheap GPUs and all hook them up via InfiniBand at 8x 3.0.
But probably this will be a bit slower than straight 4x GTX Titan X on 8x
3.0 on a single board.
Reply
Hi Tim, very nice sharing. I just would like to comment on the ‘silly’ parts (smile): the
monitors. Since I only have one monitor, I just use NoMachine and put the screen in
one of my virtual workspaces in ubuntu to switch between the current machine and
our deep learning servers. Surprisingly this is more convenient and energy efficient
both for the electricity and our neck movement. Just hope this would help
especially those who only have single monitor. Cheers.
Reply
Thanks for sharing your working procedure with one monitor. Because I got a
second monitor early, I kind of never optimized the workflow on a single
monitor. I guess when you do it well, as you do, one monitor is not so bad
overall — and it is also much cheaper!
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 317/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Xardoz says
2015-07-30 at 09:16
I have a newbie question: If the motherboard has integrated graphics facility, and if
the GPU is to be dedicated to just deep learning, should the display monitor be
connected directly to the motherboard rather than the GPU?
I have just bought a machine with GeForce Titan X card and they just sent me a e-
mail saying:
“You have ordered a graphics card with your computer and your motherboard
comes supplied with integrated graphics. When connecting your monitor it is
important that you connect your monitor cable to the output on the graphics card
and NOT the output on the motherboard, because by doing so your monitor will
not display anything on the screen.”
Intuitively,it seems that off-loading the display duties to the motherboard will free
the GPU to do more important things. Is this correct? If so, do you think that this
can be done simply? I would ask the supplier, but they sounded lost when I started
talking about deep learning on Graphics cards.
Regards
Xardoz
Reply
Hi Xardoz! You will be fine when you use connect your monitor to your GPU
especially if your using a GTX Titan X. The only significant downside of this is
some additional memory consumption which can be a couple of hundred MB. I
have 3 monitors connected to my GPU(s) and it never bothered me doing
deep learning. If you train very large convolutional nets that are on the edge of
the 12GB limit, only then I would think about using the integrated graphics.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 318/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Xardoz says
2015-08-05 at 08:24
Thanks Tim.
And yes I do need more than 12GB for training a massive NN! so I decided
to buy a small Graphics card just to run the display as suggested in one of
your comments above. Seems to work fine.
Regards
Reply
Hi Tim,
1) Great post.
2) Do you know how motherboards with dedicated PCI-E lane controllers shuffle
data between GPUs with deep learning software? For example, the PLX PEX 8747
purports control of 48 PCI-E lanes beyond the 40 lanes a top-shelf CPU controls,
e.g. allowing five x16 connections, but it’s not clear to me if deep learning software
makes use of such dedicated PCI-E lane controllers.
I ask since going beyond three x16 connections with CPU control of PCI-E lanes
only requires dual CPU, but such boards along with suitable CPUs can be in sum be
thousands of dollars more expensive than a single CPU motherboard that has a PLX
PEX 8747 chip. If the latter has as good performance for deep learning software,
might as well save the money!
Thanks!
-Charles
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 319/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
That is very difficult to say. I think the PLX PEX 8747 chip will be handled by the
operating system after you installed some driver, so that deep learning software
would use it automatically in the background. However, it is unclear to me if
you really can operate three GPUs in 16/16/16 when you use this chip, or if it will
support peer-to-peer GPU memory transfers. I think you will need to get in
touch with the manufacturer for that.
Reply
I’ll need to dig more. I’ve seen various GPU-to-GPU benchmarks for server-
grade motherboards (e.g. in HPC systems), including a raw ~ 7 GB/s using
a PLX PEX chip (lower than host-to-GPU), but I’ve had difficulty finding
benchmarks for single-CPU boards, let alone for more than three x16 GPU
connections.
Best,
Charles
Reply
AMD’s Naple CPU is expected to provide 128 lanes: 64 lanes for 4 PCIe
expansion cards at x16 and the remaining for CPU-to-CPU interconnect
(called Infinity Fabric).
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 320/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Source:
https://arstechnica.co.uk/information-technology/2017/03/amd-
naples-zen-server-chip-details/
Reply
In another article, it is implied that with 1xCPU systems, 128 lanes will
be available for I/O, presumably allowing for full x16 lanes on up to 8
GPUs, or for use with NVLink bridges.
Source:
http://www.anandtech.com/show/11183/amd-prepares-32-core-naples-
cpus-for-1p-and-2p-servers-coming-in-q2
Reply
Jon says
2015-07-16 at 17:09
Will Ecc RAM make Convolution NN or deep learning more efficient or better? In
another word, if the same money can buy me one PC with Ecc RAM vs TWO PC
without Ecc RAM, which should I pick for deep learning?
Reply
I think ECC memory only applies to 64 bit operations and thus would not be
relevant to deep learing, but I might be wrong.
ECC corrects if a bit is flipped in the wrong way due to physical inconsistencies
at the hardware level of the system. Deep learning was shown to be quite
robust to inaccuracies, for example you can train a neural network with 8-bits (if
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 321/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
you do it carefully and in the right way); training a neural network with 16-bit
works flawlessly. Note that training on 8 bit for example, will decrease the
accuracy for all data while ECC is relevant only for some small parts of the data.
However, a flipped bit might be quite severe while a conversion from 32 to 8-
bits might still be quite close to the real value. But overall I think an error in a
single bit should not be so detrimental to performance, because the other
values might counterbalance this error or in the end softmax will buffer this (an
extremely large error value sent to half the connections might spread to the
whole network, but in the end for that sample the probability for the softmax
will be just 1/classes for each class).
Remember that there are always a lot of samples in a batch, and that the error
gradients in this batch are averaged. Thus even large errors will dissipate
quickly, not harming performance.
Reply
Hi Tim,
Thank for your support on Deep Learning group.
I have a workstation DELL T7610 http://www.dell.com/sg/business/p/precision-
t7610-workstation/pd.
I want to plug in 2 Titan X from NVIDIA and ASUS. Everything seems okay, I just
wonder about PSU, cooling, and dimensions of GPU.
I will check the cooling and dimensions latter. My main concerns is about power.
Another evidences:
The power of the workstation would be:
Power Supply: 1300W (externally accessible, toolless, 80 Plus® Gold Certified, 90%
efficient)
Everything looks fine. I ran 3 GTX Titan with a 1400 watt PSU and 4 GTX Titan
with 1600 watt, so you should definitely be fine with 1300 watt and 2 GPUs. A
GTX Titan also uses more power than a GTX Titan X. Your calculation looks
good and there might even be space for a third GPU.
P.S. The comments are locked in place to await approval if someone new posts
on this website. This is so to prevent spam.
Reply
Haider says
2015-07-07 at 01:38
Tim,
I am new to deep NN. I discovered its tremendous progress after seeing the
excellent 2015 GTC NVidia talk. Deep NN will be very useful for my Phd which is
about electrical brain signal classification (Brain Computer Interface).
What a joy I found your blog! Just wished if you wrote more.
All your post are full of interesting ideas. I have checked the comments of the posts
which are not less interesting than the posts themselves and full of important hints
too.
I read a lot, but did not find most of your interesting hints on hardware elsewhere.
Your posts were just brilliant. I believe your posts filled a gap in the web, especially
on the performance and the hardware side of deep NN.
I think on the hardware side, after reading your posts I have enough knowledge to
build a good system.
On the software side, I found a lot of resources. However, I am still a bit confused.
Perhaps, because it wasn’t your posts . Why do you only write on hardware?
Your can write very well, and we love to hear from your experience on software
too..
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 323/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I’m very fond of Matlab and didn’t program much in other languages. And I don’t
know anything about python, which seems very important to learn for machine
learning. I don’t mind to learn python if you advise me to do so. But if it is not
necessary, then maybe I can spare my time to learn other deep NN stuff, which are
overwhelming already. My excitement crippled me. I have opened ~600 tabs and
want to see them all.
If you were in my shoes, what platform you will begin to learn with? Caffe, Torch or
Theano ? Why?
And please, tell me too about your personal preference. I learned from your posts
that you are making your own programs. But in case you are picking one of these
for you, what will be. And in case you were like me with no python experience, what
will you pick in that case?
I am very interested to hear your opinion. I am not in hurry.. When you feel like
writing please answer me with some details.
I thank you sincerely for all the posts and comment replies in your blog and eager
to see more posts from you Tim!
Thank you!
Reply
Thank you for all this praise — this is encouraging! I wrote about hardware
mainly because I myself focused on the acceleration of deep learning and
understanding the hardware was key in this area to achieve good results.
Because I could not find the knowledge that I acquired elsewhere on a single
website, I decided to write a few blog posts about this. I plan to write more
about other deep learning topics in the future.
In my next posts I will compare deep learning to the human brain: I think this
topic is very important because the relationship between deep learning and
the brain is in general poorly understood.
I also wanted to make a blog post about software, but I did not have the time
yet to do that — I will do so probably in this month or the next.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 324/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
and Torch7 does not work well on windows. Theano is the best option here I
guess, but also Minerva seems to be okay.
Caffe is a good library when you do not want to fiddle around to much within a
certain programming language and just want to train deep learning models;
the downside is that it is difficult to make changes to the code and the training
procedure/algorithm and few models are supported.
In the case of brain signals per se, I thin python offers a lot of packages which
might be helpful for your research.
However, if you just want to get started quickly with the language you know,
Matlab, then you can also use the neural network bindings from the Oxford
research group, with which you can use your GPU to train neural networks
within Matlab.
Zizhao says
2015-06-25 at 13:15
Do you think if you have too many monitors, it will occupy too much resources of
your GPU already? If yes, how to solve this issue?
Reply
I have three monitors with 1920×1080 resolution and the monitors draw about
400 MB. For me I never had any issues with this, but I also had 6GB cards and I
did not train models that maxed out my GPU RAM. If you have a GPU with less
memory (GTX 980 or GTX 970) then there might be some problems for
convolutional nets. The best way to circumvent this problem is to buy a really
cheap GPU for the monitors (a GT210 costs about $30 and can power two
(three?) monitors), so that your main deep learning GPU is not attached to any
monitor.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 325/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Tim, you have a wonderful blog and I am very impressed with the
knowledge as well as the effort that you are putting into it.
I run a silicon valley startup that works in the space of wearbales Bio-
sensing , we developed very unique non-invasive sensors , that can
measure vitals , psychological and physiological effects. Most of our signals
are multivariate time series, with a typically process (1×3000) per sensor per
reading , and we can typically use up to 5 sensors.
We are currently expanding our ML algorithms to add CNNs capabilities, I
wonder what do you recommend in terms of GPU.
Also I would highly appropriate if you can email me to further discuss
potentially mutually beneficial collaboration
Regards,
Sameh
Reply
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 326/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Sergii says
2015-06-19 at 12:11
In the chapter `Direct memory access (DMA)` you say “…on the third step the
reserved buffer is transferred to your GPU RAM without any help of the CPU…”, so
why does cudMemcpyAsync block the host until the end of the copy process? What
is the reason for that?
Reply
The most low-level reason I can think of is, as I said above, that pageable
memory is inherently insecure and may be swapped/pushed around at will. If
you start a transfer and want to make sure that everything works, it is best to
wait until the data is fully received. I do not know about the low level details
how the OS and its drivers and routines (like DMA) interact with the GPU. If you
want to know these details, I think it would be best to consult with people from
NVIDIA directly, I am sure they can give you a technical accurate answer; you
might also want to try the developer forums.
Reply
Sergii says
2015-06-18 at 13:36
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 327/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
It has all to do with having a valid pointer to the data. If your memory is not
pinned, then the OS can push the memory around freely to make some
optimizations, so you are not certain to have a pointer to CPU memory and
thus such transfers are not allowed by the NVIDIA software because they easily
run into undefined behaviour. With pinned memory, the memory no longer is
able to move, and so a pointer to the memory will stay the same at all times, so
that a reliable transfer can be ensured.
This is different in GPUs, because GPU pointers are designed to be reliable at all
times as long as they stay on some GPU memory, so these problems do not
exist for GPU -> GPU transfers.
Reply
Sergii says
2015-06-18 at 14:11
Thanks for the wonderful explanation. But I still have a question. Your
previous reply can explain why data transfer with pageable memory can’t
be asynchronous with respect to a host thread, but I still do not understand
why a device can’t execute kernel while copying data from a host. What is
the reason for that?
Reply
But you are right that you cannot execute a kernel and a data transfer
in the same stream. I assume there are issues with the hardware not
being able to resume a kernel once the end of a steam that is being
transferred at the very moment is reached (the kernel would need to
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 328/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
wait, then compute, then wait, then compute, then wait… — this will
not deliver good performance!). So it will be because of this that you
cannot run a kernel on partial data.
Reply
Sergii says
2015-06-18 at 12:08
Hi Tim!
Thanks for your helpful and detailed write-up.
You wrote “…one might be able to prevent that when one uses pinned memory, but
as you shall see later, it does not matter anyway…” and AFAIU you don’t use pinned
memory in the async batch allocation process (`clusternet` project).
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 329/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
What you write is all true, but you have to look at it in two different ways, (1)
CPU -> GPU, and (2) GPU -> GPU.
For CPU -> GPU you will need pinned memory to do asynchronous copies;
however, for GPU -> GPU the copy will be automatically asynchronous in most
use cases — no pinned GPU memory needed (cudaMemcpy and
cudaMemcpyAsync are almost always the same for GPU -> GPU transfers).
I turns out that I use pinned memory in my clusterNet project, but it is a bit
hidden in the source code: I use it only for batch buffers in my BatchAllocator
class, which has an embarrassingly poor design. There I transfer usual CPU
memory, to a pinned buffer (while the GPU is busy) and then transfer it in
another step asynchronously to the GPU, so that the batch is ready when the
GPU needs it.
You can also allocate the whole data set as pinned memory, but this might
cause some problems, because once pinned, the OS cannot “optimize” the
locked in memory anymore which may lead to performance problems if one
allocated a chunk of memory too large.
Reply
Kai says
2015-06-15 at 11:21
Hey Tim! Thanks for these posts, they’re highly, highly appreciated! I’m just starting
to get my feet wet in deep learning – is there any way to hook up my Laptop to a
GPU (maybe even an external one?) without having to build a PC from scratch so I
could start GPGPU programming on small datasets with less of an investment?
Does the answer depend on my motherboard?
Reply
In that case it will be best to use AWS GPU spot instances which are cheap and
fast. External GPUs are available, but they are not an option because the data
transfer, CPU -> USB-like-interface -> GPU, is too slow for deep learning. Once
you have made some experiences with AWS I would then buy a dedicated
deep learning PC.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 330/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Kai says
2015-06-15 at 14:32
Reply
What are your thoughts on the GTX 980 Ti vs. the Titan X? I guess with “980” in
your article you referred to the 4 GB models. The 980 Ti has the same Memory
Bandwidth as the Titan X, 2GB more memory than a 980 (which should make it
better for big convnets), only a few CUDA cores less. And the price difference is 549
USD for a 980 Ti vs 999 USD for the Titan X.
Reply
The GTX 980 Ti is a great card and might be the most cost effective card for
convolutional nets right now. The 6GB RAM on the card should be good
enough for most convolutional architectures. If you will be working on video
classification or want to use memory-expensive data sets I would still
recommend a Titan X over a 980 Ti.
Reply
Sinan says
2015-05-29 at 04:11
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 331/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
That sounds interesting. Would you mind sharing more details about your G3258-
based system?
Reply
I do not have a Haswell G3258 and I would not recommend one, as it only runs
16 PCIe 3.0 lanes instead of the typical 40. So if you are looking for a CPU I
would not pick Haswell — too new and thus too expensive, and many Haswells
do not have full 40 PCIe lanes.
Reply
Sinan says
2015-05-29 at 05:31
First of all, thank you for a series of very informative posts, they are all
much appreciated.
I was planning to go for a single GPU system (GTX 980 or the upcoming
980 Ti) to get started with deep learning, and I had the impression that at
$72, this is the most affordable CPU out there.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 332/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Mark says
2015-05-26 at 12:26
That second 10x speed up claim with NVLink is a bit strange bc it is not clear how it
is being made.
Reply
Richard says
2015-05-16 at 16:53
Hi Tim,
First can I say thanks very much for writing this article – it has been very
informative.
I’m a first year PhD student. My research is concerned with video classification and
I’m looking into using convolutional nets for this purpose.
My current system has a Gt 620 which takes about 4 hours to run a lenet5 based
network built using theano on MNIST. So I’m looking to upgrade and I have about
£1000 to do it with.
I’ve allocated about £500 for the gpu but I’m struggling to decide what to get. I’ve
discounted the gtx 970 due to the memory problems. I was thinking either gtx 780
(6gb asus version), gtx 980 or two gtx 960’s. What is your opinion on this? I know I
can’t use multiple gpus with theano but I could run two different nets at the same
time on the 960’s, however would it be quicker just to run each net consecutively
on the 980 since its faster. Also there’s the 780 which although would be slower
than the 980 it has more ram which would be beneficial For convolutional nets. I
looked into buying second hand as you suggested however I’m buying through my
university so that isn’t an option.
Thanks for your help and for the great article once again.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 333/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Cheers,
Richard
Reply
That is really a tricky issue, Richard. If you use convolutional on the spatial
dimensions of an image as well as the time dimension, you will have 5
dimensional tensors (batch size, rows, columns, maps, time) and such tensors
will use a lot of memory. So you really want a memory card. If you use
the Nervana Systems 16-bit kernels you would be able to reduce memory
consumption by half; these kernels are also nearly twice as fast (for dense
connections there are more than twice as fast). To use the Nervana Systems
kernels, you will need a Maxwell GPU (GTX Titan X, GTX 960, GTX 970, GTX
980). So if you use this library a GTX 980 will have “virtually” 8GB of memory,
while the GTX 780 has 6GB. The GTX 980 is also much faster than the GTX 780,
which further adds to the GTX 980 options. However, the Nervana Systems
kernels still lack some support for natural language processing, and overall you
will have a far more thorough software if you use torch and a GTX 780. If you
think about adding your own CUDA kernels, the Nervana Systems + GTX 980
option may be not so suitable, because you probably will need to handle the
custom compiler and program 16-bit floating point kernels (I have not looked
at this, but I believe there will be things which makes it more complicated than
regular CUDA programming).
I think both, GTX 780 and GTX 980 are good options. The final choice is up to
you!
Cheers,
Tim
Reply
Richard says
2015-05-20 at 11:50
Think i’ll go with the 780 for now due to the extra physical memory. Quick
follow up question: if I have the money for an additional card in the future
would I need to buy the same model. Could I for example have both a GTX
780 and a GTX 980 running in the same machine so that I can have two
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 334/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Cheers,
Richard
Reply
GPUs can only communicate directly if they are based on the same
chip (but brands may differ). So for parallelism you would need to get
another GTX 780, otherwise a GTX 980 is fine for everything else. Also
remember, that new Pascal GPUs will hit around Q3 2016 and those will
be significantly faster than any Maxwell GPU (3D memory) — so
waiting might be an option as well.
Reply
Mark says
2015-05-26 at 12:21
FYI on Pascal chip from NVIDIA. Speed up over Titan is “up to 5x.” Of this, a
2x speed up will come from the option of switching to using 16 bit floating
point in Pascal.
The rest of the “up to 10x speed up” comes from the 2x speed up you get
from NVLink. Here the comparison is two Pascal versus two Titans. I don’t
know what the speed up would be if the Pascals used the same PCI
interlink as the Titans or if they could even use the PCI interlink. Hopefully
so then a new motherboard would not be necessary.
Reply
Thomas says
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 335/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2015-05-12 at 23:46
Hi Tim,
Thank you for all your advice on how to build a machine for DL!
You don’t talk about the possibility of using an embedded GPU in the motherboard
(or a “small” second GPU) so as to dedicate the “big” GPU to calculus. Could that
affect the performance in any way?
Also we want to build a computer to reproduce and improve -by making a more
complex model- the work of DeepMind about their generalist AI.
We were thinking about getting one Titan X with 32G of RAM.
Would you have any specific recommendation concerning the motherboard and
CPU?
Reply
There are some GPUs which are integrated (embedded) in regular CPUs and
you can run your monitors on these processors. The effect of this is some
saved memory (about a hundred MB for each monitor) but very little
computational resources (less than 1 % for 3 monitors). So if you are really
short on memory (say you have a GPU with 2 or 3GB and 3 monitors) then this
might make good sense. Otherwise, it is not very important and buying a CPU
with integrated graphics should not be a deciding factor when you buy a CPU.
As I said in the article, you have a wide variety of options for the CPU and
motherboard, especially if you will stick with one GPU. In this case you can
really go for very cheap components and it will not hurt your performance
much. So I would go for the cheapest CPU and motherboard with a reasonable
good rating on pcpartpicker.com if I were you.
Reply
Thomas says
2015-05-13 at 16:53
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 336/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Tim,
Thanks for the excellent guide! It has helped us a lot. However, a few questions
remain…
We plan to build a deep-learning machine (in a server rack) based on 4 Titan cards.
We need to select other hardware. Ideally we would put all four cards on a single
board with 4x PCIe 3.0 x16. The questions are:
We plan to use these nets for both constitutional and dense learning. Our budget
(everything except the Titans) is around $3000, preferably less, or a bit more if
justified. Please advise!
Reply
I just read the above post as well and got some needed information, sorry for
spamming. From what I understand, SLI is not beneficial.
Should we then go for two weaker Xeons (2620), each with 40 PCIe lanes? Will
this be cost-optimal?
Thanks,
F
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 337/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2 CPUs will typically yield no speedup because usually the PCIe networks of
each CPU (2 GPUs for each CPU) are disconnected which means that the GPU
pairs will communicate through CPU memory (max speed about 4 GB/s,
because a GPU pair will share the same connection to the CPU on a PCIe-
switch). While it is reasonable for 8 GPUs, I would not recommend 2 CPUs for a
4 GPU setup.
There are motherboards that work differently, but these are special solutions
which often only come in a package of a whole 8 GPU server rack ($35k-$40k).
If you use a single GPU then any motherboard with enough slots and which
supports 4 GPUs will do; choose the CPU so that it supports 40 PCIe lanes and
you will be ready to go. Socket 2011 has no advantage over other sockets which
fulfill these requirements.
Regarding SLI: SLI can be used for gaming, but not for CUDA (it would be too
slow anyways); so communication is really all done by PCI Express.
Reply
Florijan says
2015-05-12 at 11:20
In particular the Z10PED8 states it supports “4 x PCIe 3.0/2.0 x16 (dual x16
or quad x8)”, from which I understand it does NOT support quad x16.
Would the X99 be the best solution then?
Reply
2015-05-12 at 11:58
It is quite difficult to say which one is better, because I do not know the
PCIe switch layout of the dual CPU motherboard. The most common
PCIe switch layout is explained in this article and if the dual CPU
motherboard that you linked behaves in a similar way, then for deep
learning 2 CPUs will be definitely be slower than 1 CPU if you want to
use parallel algorithms across all 4 GPUs; in that case the 1 CPU board
will be better. However, this might be quite different for other
computing purposes than deep learning and a 2 CPU board might be
better for those tasks.
Reply
sacherus says
2015-05-05 at 22:44
Hi Tim,
thank you for your great article. I think it’s cover everything that you need to know
to start your journey with DL.
I’m also grad student (but instead of image processing, I’m in speech processing)
and want to buy some machine (I thinking also about Kaggle, but for beginning I
could take 20-40 place ). I want to buy (East Europe) used workstation (without
graphics) + used graphics. Probably I will end up with 2 cards in my computer…
Maybe 3….
Questions:
1) You wrote that you need to have 7 3.0 slots motherboard for 3 GPUs. Isn’t
possible to have
16 x | 1x | 16x | 1x (etc) setup? Like in http://www.msi.com/product/mb/Z87-G45-
GAMING.html#hero-overview?
2) So there do not exist setups that support 16x/16x (or are to expensive)?
3) I see that computation compatibility also matters. I can buy geforce 780 ti in
similar price to gtx 970. 780 ti has better bandwith + more GFLOPS (you never
mentioned about FLOPS), but 970 has newer CC + more memory.
4) Maybe I should let go and buy what… 960 or 680 ( just start)… However, 970 is
not much expensive than those 2. Or just buy whole used PC.?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 339/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
1. You are right, a 16x | 1x | 16x | 1x setup will work just as well; I did not thought
about that in this way, and I will update my blog with that soon — thanks!
2. I hope I understand you right: You have a total of 40x PCIe lanes supported
by your CPU (not the physical slots, but this is sort of the communication wires
that are layed from the PCIe slots to the CPU) and your GPUs will use up to 16x
(standard mainboards) for that; so 16x 16x is standard if you use 2 GPUs, for 3
GPUs this is 16x8x16 and for 4 GPUs 16x8x8x8. If you mean physical slots, then a
16x | Yx | 16x setup will do, where Y is any size; because most GPUs have a
width of two PCIe slots you most often cannot run 2 GPUs on 16x | 16x
mainboard slots, sometimes this will work if you use watercooling though
(reduces the width to one slot)
3. GFLOPS do not matter in deep learning (its virtually the same for all
algorithms), your algorithms will always be limited by bandwidth; the 780 TI has
higher bandwidth, but inferior architecture and the GTX 970 would be faster.
However, the GTX 780 TI has no gliches, and so I would go with the GTX 780 TI
4. The GTX 680 might be a bit more interesting than the GTX 780 TI if you
really want to train a lot of convolutional nets; otherwise a GTX 780 TI is best; if
you only use dense networks you might want to go with the GTX 970
Reply
Yu Wang says
2015-04-29 at 22:38
Hi Tim,
Thanks for the insightful posts. I’m a grad student working in the image processing
area. I just started to explore some deep learning techniques with my own data. My
dataset contains 10 thousand 800*600 images with 50+ classes. I’m wondering
GTX970 will be sufficient to try different networks and algorithms, including CNN.
Reply
Although your data set is very small and you will only be able to train a small
convolutional net before you overfit the size of the images is huge.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 340/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Unfortunately, the size of the images is the most significant memory factor in
convolutional nets. I think a GTX 970 will not be sufficient for this.
However, keep in mind, that you can always shrink the images to keep them
manageable. for a GTX 970 you will need to shrink them to about 250*190 or
so.
Reply
Yu Wang says
2015-05-01 at 23:39
Thanks for the quick reply. Look forward to your new articles.
Reply
Dimiter says
2015-04-28 at 08:40
Tim,
Thanks for a great write-up. Not sure what I’d have done without it.
A bit of a n00b question here,
Do you thinks it matters in practice if one has PCI2 2.0 or 3.0?
Thanks
Reply
If it is possible that you will have a second GPU at anytime in the future
definitely get a PCIe 3.0 CPU and motherboard. If you use additional GPUs for
parallelism, then in the case of PCIe 2.0 you will suffer a performance loss of
about 15% for a second GPU, and much larger losses (+40%) for your third and
fourth GPU. If you are sure that you will stay with one GPU in the future, then
PCIe 2.0 will only give you a small or no performance decrease (0-5%) and you
should be fine.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 341/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Mark says
2015-04-28 at 16:09
This may not make much difference if you care about a new system now or
about having a more current system in the future. However, if you want to keep
it around for years and use it for other things besides ML then wait a few
months.
Intel’s Skylake CPU will be released in a few months along with it’s new chip set,
new socket, new motherboards etc. All PCI 3, ddr4, etc. It’s considered a big
change compared to prior CPU’s. Skylake prices are suppose to be similar to
current offerings but retailers say they expect the price of ddr4 to drop. Don’t
really understand why but gamers are also waiting for the release … maybe just
because “new and improved” since it doesn’t seem to translate into a big plus
for the gaming experience.
Reply
Thanks for a great guide! I’m wondering if you could give me a rough estimate of
the performance boost I would get by upgrading my system? Would be awesome
to have that before I spend my hard-earned money! I supposed it’s mainly based
on my current GPU, but here’s a bit of info about the rest of the system as well.
Current setup:
ATI Radeon™ HD 5770 1gb
One of the last CPU’s from the 775-socket series.
4gb ram
SSD
Upgraded setup:
GTX 960 4gb
Modern dual-thread CPU with 2+ GHz
8gb ram
SSD
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 342/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2) Any idea of what the performance reduction would be by doing deep learning in
caffe using a Virtualbox environment of Ubuntu instead of doing a plain Ubuntu
installation?
Reply
1. I never had any problems with my motherboards, so I cannot give you any
advice here on that topic.
2. I also had this idea once, but it is usually impossible to do this: CUDA and
virtualized GPUs do not go together, you will need specialized GPUs (GRID
GPUs, which are used on AWS); even if they would go together there would be
a stark performance decrease.
Reply
Thanks for the quick response! I’ll try Ubuntu then (perhaps some dual-
booting). Would it make sense to add water-cooling to a single GTX 960 or
would that be overkill?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 343/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Shinji says
2015-04-10 at 07:31
I’m interested in the actual PCIe bandwidth in the deep learning process. Are PCIe
16 lanes needed for deep learning? Of course x16 PCIe gen3 is ideal for the best
performance, but I’m wondering if x8 or x4 PCIe gen3 is also enough performance.
Which do you think better solution if the system has 64 PCIe lanes?
Reply
Each PCIe lane for PCIe 3.0 has a theoretical bandwidth of about 1 GB/s, so you
can run GPUs also with 8 lanes or 4 lanes (8 lanes is standard for at least one
GPU if you have more than 2 GPUs), but it will be slower. How much slower will
depend on the application or network architecture and which kind of
parallelism is used.
64 PCIe lanes are only supported by two CPU motherboards and these boards
often have a special PCIe switching architecture which connects the two
separate PCIe systems (one for each CPU) with each other; I think you can only
run up to 8 GPUs with such a system (the BIOs often cannot handle more GPUs
even if you have more PCIe slots). But if you take this as a theoretical example it
is best to just do some test calculations:
16 GPUs means 15 data transfers to synchronize information; 4 PCIe lanes / 15
transfers = 0.2666 GB/s for a full synchronization. If you now have a weight
matrix with say 800×1200 floating point numbers you have 800x1200x1024^-3=
0.0036 GB. This means you could synchronize 0.2666/0.0036 = 74 gradients
per second. A good implementation of MNIST with batchsize 128 will run with
about 350 batches per second. So the result is that 16 GPUs with 4 PCIe lanes
will be 5 times slower for MNIST. These numbers are better for convolutional
nets, but not much better. Same for 4 GPUs/16 lanes:
16/3 = 5.33; 5.33/0.0036 = 647; so in this case there would be a speedup of
about 2 times; this is better for convolutional nets (you can except a speedup
of 3.0-3.9 depending on the implementation). You can do similar calculations
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 344/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
for model parallelism in which the 16 GPU case would fare a bit better (but it is
probably still slower than 1 GPU).
So the bottom line is that 16 GPUs with 4 PCIe lanes are quite useless for any
sort of parallelism — PCIe transfer rates are very important for multiple GPUS.
Reply
Shinji says
2015-04-10 at 10:07
If the GPU processing time is longer enough than data transfer time, the
data transfer time for synchronization is negligible. In that case, it is
important to have many GPUs rather than PCIe bandwidth.
Is my assumption unlikely in usual case?
Reply
This is exactly the case for convolutional nets, where you have high
computation with small gradients (weight sharing). However, even for
convolutional nets there are limits to this; beyond eight GPUs it can
quickly become difficult to gain near-linear speedups, which is mostly
due low interconnections between computers. A 8 GPU system will be
reasonably fast with speedups of about 7-8 times for convolutional
nets, but for more than 8 GPUs you have to use normal interconnects
like infiniband. Infiniband is similar to PCIe but its speed is fixed at
about 8-25 GB/s (8GB/s is the affordable standard; 16 GB/s is
expensive; 25 GB/s is very, very expensive): So for 6 GPUs + 8GB/s
standard connection this yields a standard bandwidth of 1.6 GB/s which
is much worse than the 4 GPU 16 lanes example; for 12 GPUs this is
0.72 GB/s; 24 GPUs 0.35GB/s; 48 GPUs 0.17GB/s. So pretty quickly it will
be pretty slow even for convolutional nets.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 345/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
Reply
Mark says
2015-04-08 at 20:41
https://youtu.be/rctaLgK5stA
It comes down to using say a 4th 980 or Titan otherwise if it’s three or less then
there is no real performance difference. This means a saving on the CPU of about
$200.
What’s your thoughts since you warned about the i7 5820 in your article?
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 346/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Yes the i7 5820K only has 28 PCIe lanes and if you buy more than one GPU I
would choose definitely a different CPU . The penalty will be observable when
you use multiple GPUs especially if you will use 4x GTX980 (personally, I would
choose a cheap CPU < $250 with 40 lanes and instead buy 4x Titan GTX X —
that will be sufficient) One note though, remember that in 2016 Q3/Q4 there
will be Pascal GPUs, which are about 10 times better than a GTX Titan X (which
is 50 % better than a GTX 980), so it might be reasonable to go with a cheaper
system and go all out once Pascal GPUs are released.
Reply
Mark says
2015-04-09 at 20:29
Well if i buy now in terms of the CPU and motherboard then I would like to
upgrade this system in a couple years to Pascal. To keep this base system
current over a few years then would you still recommend a x99
motherboard? If so then I am stuck with only two choices 5930 or 5960.
AMD has cpu’s and associated motherboards but I am not familiar with
anything going that direction. Do they have something in mind here that is
cheaper, about the same performance and can handle up to 4
980/titan/pascal GPU’s?
Reply
A x99 motherboard might be a bit overkill. You will not need most of
its features like DDR4 RAM. As you said, the Pascal GPUs will use their
own interconnect which is much faster than PCIe — this would be
another reason to spend less money on the current system. A system
based on either the LGA1150 or the LGA2011 would be a good choice
in terms of performance/cost.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 347/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I do not have experience with AMD either, but from the calculations in
my blog post I am quite certain that it would also be a reasonable
choice. I think in the end it just comes down how much money you
have to spare.
Reply
Mark says
2015-04-09 at 21:52
Great thanks! Still one thing remain unclear to a newbie builder like
me. Is an x99 chip set wed to only motherboards which will not
work with Volta/Pascal? If not then I can just swap out the
motherboard but keep the x99 compatible CPU, memory, etc.
Also, since you are writing about convolutional nets, these are
front-ends the feed neural nets. However, there is a new paper on
using an SVM approach that needs less memory, is faster and just
as accurate as any state-of-the-art covnet/neural-net combo. It
keeps the convolution and pooling layers but replaces the neural
net with a new fast-food (LOL) version of SVM. They claim it works
“better”
Peyman says
2015-03-31 at 06:14
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 348/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
You can use the same GPU for computation and for display, there will be no
problem. The only disadvantage is, that you have a bit less memory. I use 3x 27
inch monitors at 1920×1080 and this config uses about 300-400 MB of memory
which I hardly notice (well, I have 6GB of GPU memory). If you are worried
about that memory you can get a cheap NVIDIA GT210 (which can hold 2
monitors) for $30 and run your display on that, so that your GTX 980 is
completely free for CUDA applications.
Reply
Harry says
2017-01-11 at 03:05
I realize this is an old post but what motherboard did you pick? Most LGA2011
seem to not support dual 16x which I thought was the attraction of the 40 pcie
lanes.
Reply
Hi Tim,
I’m interested about the GPU bios. Can you share what bios which includes a new,
more reasonable fan schedule are you using right now? I have 2 titan x waiting to
be flashed.
Reply
2015-03-30 at 18:43
I do not know if GTX 970/ GTX 980 BIOS is compatible with a GTX Titan X BIOS.
Doing a quick google search, I cannot find information about a GTX Titan X
BIOS, which might be, because the card is relatively new.
I think you will find the best information in folding@home and other crowd-
computing forums (also cryptocurreny mining forums) to get this working.
Reply
Thanks for the pointers. fah is very interesting XD, though I don’t find titan
x bios yet. Guess I have to live with it for a while.
I saw you have plan to release deep learning library in the future. What
framework will you be working on? Torch7, Theano, Caffe?
Reply
Hi Tim,
I bought an MSI G80 Laptop to learn and work on deep learning which connects 2
GPU using SLI, could you please tell me if I could run deep learning on this laptop
even in one GPU.
Regards,
Reply
Yes you will be able to use a single GPU for deep learning, SLI has nothing to
do with CUDA. Even if there are dual-GPUs (like the GTX 590) on a hardware
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 350/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
level you can simply access both GPUs separately. This is also true for software
libraries like theano and torch.
Reply
salemameen says
2015-03-24 at 08:01
Thanks Tim,
Because I don’t have background in coding, I want to use existing libraries.
By the way, I bought this laptop not for gaming for deep learning, I
thought would be more powerful with 2 GPUs, but even if one works fine
that is ok for me. Regards,
Reply
You’re welcome! If you use Torch7 you would will be able to use both
GPUs quite easily. If you dread working with Lua (it is quite easy
actually, the most code will be in Torch7 not in Lua), I am also working
on my own deep learning library which will be optimized for multiple
GPUs, but it will take a few more weeks until it reaches a state which is
usable for the public.
Reply
Mark says
2015-03-24 at 17:54
Will start with 16G or 32G DDR4 (haved decided yet, ~$500-$700
US).
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 351/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
GTX 980’s are ~$500 US and GTX Titans ~$1000 US. Besides loss of
PCI slots, extra liquid cooling, what speed difference does one
expect in a system with two GTX 980’s versus an identical system
with one GTX Titan?
I do not think the boards make a great difference, they are rather
about the chipset (x99) than anything else.
One GTX Titan X will be 150% as fast as a single GTX 980, so two
GTX 980 are faster, but because one GPU is much better and
easier to use than two, I would go for the GTX Titan X if you can
afford it.
Mark says
2015-03-31 at 13:20
“One GTX Titan X will be 150% as fast as a single GTX 980, so two
GTX 980 are faster, but because one GPU is much better and
easier to use than two, I would go for the GTX Titan X if you can
afford it.”
Thanks for advice. Could you elaborate a bit more on the ease of
use between one gpu versus two?
Also, i understand the Titan will be replace this year with a faster
GTX 980 Ti. They will be the same price.
benoit says
2015-03-18 at 15:25
Motherboard: Get PCIe 3.0 and as many slots as you need for your (future) GPUs
(one GPU takes two slots; max 4 GPUs per system)
Reply
That’s right, modern GPUs will run faster on a PCIe 3.0 slot.
To install a card you only need a single PCIe 3.0 slot, but because you have a
width of two PCIe slots each card will render the PCIe slot next to it unusable.
For 3 GPUs you will need 5 PCIe slots, because the first two cover 4 slots and
you will need a single fifth slot for the last GPU.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 353/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
benoit says
2015-03-18 at 15:39
when I was mining bitcoins (unfortunately with radeon hence why I’m so
interested in your article) I used PCI risers like
(http://www.amazon.fr/gp/product/B001CC3BNS?
psc=1&redirect=true&ref_=oh_aui_search_detailpage)
Do you think those can act as a bottleneck between the PCIe 3.0 slot of the
MB and the GPU ?
Using those could prove useful in finding cheaper MB with less PCIe 3.0
slots.
Reply
I also read a bit about risers when I was building my GPU cluster, and I
often read that there was little to no degradation in performance.
However, I do not know what PCIe lane configuration (e.g. 16/8/8/8, or
16/16/8 are standard for 4 and 3 GPUs, respectively) the motherboard
will run under such a configuration and this might be a problem (the
motherboard might not support it well). For cryptocurrency mining this
is usually not a problem, because you do not have to transfer as much
data over the PCIe interface if you compare that to deep learning —
so probably there is no one that ever tested this under deep learning
conditions.
So I am not really sure how it will work, but it might be worth a try to
test this on one of your old mining motherboards and then buy a
motherboard accordingly. If you decide to do so, then please let me
know. I would be really interested in what is going on in that case and
how well it works. Thanks!
Reply
I have tried and built many systems with passive risers. Always use
similar ones (x16 to x16, but this one seems a bit cheap.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 354/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
I never could make it work with molex risers though. I would have
packet loss and it would make the training fail.
Stijn says
2015-03-15 at 07:44
What is the largest dataset you can analyze, you can choose the specs you want,
and how much time would it take?
Reply
The sky is the limit here. Google ran conv nets that took months to complete
and which were run on thousands of computers. For practical data sets,
ImageNet is one of the larger data sets and you can expect that new data sets
will grow exponentially from there. These data sets will grow as your GPUs get
faster, so you can always expect that the state of the art on a large popular
data set will take about 2 weeks to train.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 355/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Mark says
2015-03-14 at 18:26
Reply
I only have experience with motherboards that I use, and one of them has a
minor hardware defect and thus I do not think my experience is representative
for the overall mainboard product, and this is similar for other hardware pieces.
I think with the directions I gave in this guide you can find your pieces on your
own through lists that feature user rating like http://pcpartpicker.com/parts/
Often it is quite practical to sort by rating and buy the first highly rated
hardware piece which falls in your budget.
Reply
What’s your thought on using g2.xlarge instead of building the hardware? I believe
g2.xlarge is a lot slower than GTX 980. However it is possible to spawn many
instances on AWS at the same time which might be useful for tuning
hyperperameter.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 356/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Indeed the g2.xlarge is much slower than the GTX 980, but also much cheaper.
It is a cheap option if want to train multiple independent neural nets, but it can
be very messy. I only have experience with regular CPU instances, but with
those it can take considerable time to manage one’s instances, especially if you
are using AWS for large data sets together with spot instances — you will
definitely be more productive with a local system. But in terms of affordability
GPU instances are just the best.
I just want you make you aware of other downsides with GPU instances, but the
overall conclusion stays the same (less productivitz, but very cheap): You
cannot use multiple GPUs on AWS instances because the interconnect is just
too slow and will be a major bottle neck (4 GPUs will run slower than 2). Also
the PCIe interconnect performance is crippled by the virtualization. This can be
partly improved by a hacky patch, but overall the performance will still be bad
(it might well be that 2 GPUs are worse than 1 GPU).
Also like the GTX 580, the GPU instances do not support newer software, and
this can be quite bad if you want to run modern variants of convolutional nets.
Reply
What IDE are you using in that pic? It looks like Eclipse but I can’t quite tell. Great
article, a full breakdown is just what I needed!
Reply
Glad that you liked the article. I am using Eclipse (NVIDIA Nsight) for
C++/C/CUDA in that pic; I also use Eclipse for Python (PyDev) and Lua
(Koneki). While I am very satisfied with Eclipse for Python and CUDA, I am less
satisfied with Eclipse for Lua (that is torch7) and I probably will switch to Vim for
that.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 357/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
sshidy2014 says
2015-03-12 at 23:03
About (possibly) multiple GPUs, would nvidia SLI be of any significant help?
Reply
Thanks for your comment. NVIDIA SLI is an interface which allows to render
computer graphics frames on each GPU and exchange them via SLI. The use of
SLI is limited to this application, so doing computations and parallelizing them
via SLI is not possible (one needs to use the PCIe interface for this). So CUDA
cannot use SLI.
Reply
Thoughts on the Tesla K40? It’s one of the GPUs available through NVIDIA’s
academic hardware grant
program: https://developer.nvidia.com/academic_hw_seeding
Reply
academic grant program, this might be the better choice as it is much faster
and will have the same amount of memory.
Reply
dh says
2015-03-20 at 21:02
why is k40 much more expensive when gtx x is cheaper but has more cores
and higher bandwidth?
Reply
Reply
zeecrux says
2015-07-03 at 10:00
ImageNet on K40:
Training is 19.2 secs / 20 iterations (5,120 images) – with cuDNN
and GTX770:
cuDNN Training: 24.3 secs / 20 iterations (5,120 images)
(source: http://caffe.berkeleyvision.org/performance_hardware.html)
So for 450000 iterations, it takes 120 hours (5 days) on K40, and 162.5
hours (6.77 days) on GTX 960.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 359/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Now K40 costs > 3K USD, and GTX 960 costs < 300 USD
Reply
Reply
Hannes says
2015-03-11 at 03:45
I find the recommendation of the GTX 580 for *any* kind of deep learning or
budget a little dubious since it doesn’t support cuDNN. What good is a GPU that
doesn’t support what’s arguably the most important library for deep learning at the
moment?
Reply
This is a really good and important point. Let me explain my reasoning why I
think a GTX 580 is still good.
The problem with no cuDNN support is really that you will require much more
time to set everything up and often cutting-edge features that are
implemented in libraries like torch7 will not be available. But it is not impossible
to do deep learning on a GTX 580 and good, usable deep learning software
exists. One will probably need to learn CUDA programming to add new
features through one’s own CUDA kernels, but this will just require time and not
money. For some people time and effort is relatively cheap, while money is
rather expensive. If you think about students in developing countries this is very
much true; if you earn $5500 a year (average GDP per capita ppp of India; for
the US this is $53k – so think about your GPU choice if you had 10 times less
money) then you will be happy that there is a deep learning option that costs
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 360/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
less than $120. Of course I could recommend cards, like the GTX 750, which are
also in that price range and which work with cuDNN, but I think a GTX 580
(must faster and more memory) is just better than a GTX 750 (cuDNN support)
or other alternatives.
EDIT: I think it might be good to add another option, which offers support for
cuDNN but which is rather cheap, like the GTX 960 4GB (only a bit slower than
the GTX 580) which will be available shortly for about $250-300. But as you
see, an additional $130-180 can be very painful if you are a student in a
developing country.
Reply
DarkIdeals says
2016-09-08 at 08:33
A great 2016 update if you happen to still frequent this blog (don’t see any
recent posts) is the new GTX 1060 Pascal graphic card. Specifically the 3GB
model. Now 3GB is definitely cutting a tad close on memory, however it’s a
VASTLY superior choice to both a 580 AND a 960 4gb. The 1060 6GB
model is equivalent to a GTX 980 in overall performance, and the 3GB 1060
model is only ever-so-slightly weaker putting it at the level of a hugely
overclocked GTX 970 (i’m talking like ~1,650mhz 970 levels. Which is
maybe ~5% below a 980)
And the 3GB 1060 can be had for a measly $199 BRAND NEW! It’s definitely
something to consider at least. And if you still desperately need that extra
VRAM then you can even get the 6GB version of the 1060 (which as i
mentioned is literally about tied with an average GTX 980! ) can be had for
as little as $249 right now!
Reply
I updated my GPU recommendation post with the GTX 1060, but I did
not mention the 3GB version, that did not exist at that time. Thanks for
letting me know!
Reply
Khalid says
2017-04-13 at 09:02
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 361/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Hi,
I want to get a system with GPU for speech processing and deep
learning application using python language.
what,the.heck… Could you have skipped the blather and gotten to the point? There
are only a few specific combinations that support what you were trying to explain
so maybe something like:
– GTX 580/980
– i5 / i7 CPU
– Lots of ram (duh)
– Fast hard drive
Reply
Give a man a fish and you feed him for a day; teach a man to fish and you feed
him for a lifetime.
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 362/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Reply
zeng says
2016-08-29 at 05:17
Reply
lU says
2015-03-09 at 22:59
It also confirms my choice for a pentium g3258 for a single GPU config. Insanely
cheap, and even has ecc memory support, something that some folks might want
to have..
Reply
cicero19 says
2015-03-09 at 20:36
Hi Tim,
Thanks!
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 363/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
2015-03-10 at 08:41
There are many CPUs in all different price ranges which are all reasonable
choices and most CPUs support 40 PCIe lanes. The best practice is probably to
look at site like http://pcpartpicker.com/parts/cpu/ an select a CPU with a good
rating and a good price; then check if it supports the 40 lanes and you will be
good to go.
Reply
Good afternoon. Can you please help me. There is a used computer offer
in my neighbourhood for about 800$.
i7 4790k
MSI 1080
DDR3 4Gx2 + DDR3 8Gx2
wd 1tb ssd
Is it a good choice for deep learning for beginning?
Reply
Yes, it would be okay for a beginner. You can run most models and
explore deep learning problems. You will not be able to run some of
the largest deep learning models, but that should not be your goal
when you learn and explore deep learning anyway.
Reply
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 364/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Execuse me for not topic question. But are you familiar with
tensorflow?
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 365/366
2024/4/2 12:40 A Full Hardware Guide to Deep Learning — Tim Dettmers
Save my name, email, and website in this browser for the next time I comment.
POST COMMENT
https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/ 366/366