Professional Documents
Culture Documents
net/publication/348555212
CITATIONS READS
0 301
1 author:
Mirza Mansab
Xiamen University
1 PUBLICATION 0 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Mirza Mansab on 17 January 2021.
AI Chips Basics
AI chips, as the term suggests, refers to a new generation of microprocessors which are
specifically designed to process artificial intelligence tasks faster, using less power. Definition of
“AI chips” includes graphics processing units (GPUs), field-programmable gate arrays (FPGAs),
and certain types of application-specific integrated circuits (ASICs) specialized for AI calculations.
Our definition also includes a GPU, FPGA, or AI-specific ASIC implemented as a core on system-
on-a-chip (SoC). AI algorithms can run on other types of chips, including general-purpose chips
like central processing units (CPUs), but we focus on GPUs, FPGAs, and AI-specific ASICs because
of their necessity for training and running cutting-edge AI algorithms efficiently and quickly, as
described later in the paper.
Like general-purpose CPUs, AI chips gain speed and efficiency by incorporating huge numbers of
smaller and smaller transistors, which run faster and consume less energy than larger transistors.
But unlike CPUs, AI chips also have other, AI-optimized design features. These features
dramatically accelerate the identical, predictable, independent calculations required by AI
algorithms.
They include executing a large number of calculations in parallel rather than sequentially, as in
CPUs; calculating numbers with low precision in a way that successfully implements AI algorithms
but reduces the number of transistors needed for the same calculation; speeding up memory
access by, for example, storing an entire AI algorithm in a single AI chip; and using programming
languages built specifically to efficiently translate AI computer code for execution on an AI chip.
Different types of AI chips are useful for different tasks. GPUs are most often used for initially
developing and refining AI algorithms; this process is known as “training.” FPGAs are mostly used
to apply trained AI algorithms to real world data inputs; this is often called “inference.” ASICs can
be designed for either training or inference.
Older AI chips with their larger, slower, and more power-hungry transistors incur huge energy
consumption costs that quickly balloon to unaffordable levels. Because of this, using older AI
chips today means overall costs and slowdowns at least an order of magnitude greater than for
state-of the-art AI chips. These cost and speed dynamics make it virtually impossible to develop
and deploy cutting-edge AI algorithms without state-of-the-art AI chips.
Even with state-of-the-art AI chips, training an AI algorithm can cost tens of millions of U.S. dollars
and take weeks to complete. In fact, at top AI labs, a large portion of total spending is on AI-
related computing. With general-purpose chips like CPUs or even older AI chips, this training
would take substantially longer to complete and cost orders of magnitude more, making staying
at the research and deployment frontier virtually impossible. Similarly, performing inference
using less advanced or less specialized chips could involve similar cost overruns and take orders
of magnitude longer.
AI chip is becoming more and more important, and there are more and more application
scenarios.
1. There are two reasons for the development of dedicated chips: first, the problem to be
solved is important enough for us to spend valuable hardware resources to solve it.
second, the algorithm for this problem should have characteristics and be able to use the
"circuit approach" to deal with it efficiently.
2. If you want to achieve Al, it is not enough to have AI algorithm, and you must have AI chip,
which is the guarantee of computing power.
3. At present, the implementation schemes of artificial intelligence acceleration chip are
GPU, FPGA and ASC. In different application scenarios, we will choose different solutions.
2. Why are dedicated chips on the rise?
So why have special-purpose chips become an important point of competition in recent years?
I'm going to talk about the development of AI. Now we can use AI for many things, such as face
recognition unlocking of mobile phones, voice-controlled intelligent speakers, and even self-
driving cars. These seem to be new things that have emerged in recent years, but artificial
intelligence has been studied for decades. Why do I feel like there has been a big explosion of
artificial intelligence in recent years?
It is because these three elements have not been collected in the past: big data, algorithm and
computing power. Take face recognition as an example. In a study based on convolution neural
network in 1997, there were only 400 pictures of 40 people. But in 2014, the deep convolution
neural network algorithm used by Facebook collected 4 million pictures of 4000 people for model
training, and achieved 97.35% accuracy, which is close to the human level!
But don't think that big data alone can make face recognition more successful. Ordinary general-
purpose chips simply can't handle such massive operations. For example, if you want to make a
self-driving car, it will automatically avoid obstacles. This requires a chip to calculate the evasion
route according to real-time traffic conditions. If the car is on the road and there is an obstacle
20 meters ahead, if you use the CPU in the computer to calculate "do I want to avoid it or not", I
am afraid it will have to be calculated until the damage assessment staff of the insurance
company have arrived at the scene, and this result has not yet been calculated.
Therefore, if you want to achieve AI, it is not enough to have AI algorithm, but also must have AI
chip, which is the guarantee of computing power. The AI chip we are talking about is actually
called "artificial intelligence acceleration chip". It is a special chip, the function is to speed up the
AI algorithm. For example, just said that face recognition can be successful, because the use of
GPU to do computing, GPU has become the earliest AI chip.
Why can GPU be used as an AI chip?
Because we find that the AI algorithm is also very characteristic-although it has a large amount
of computation, it also has a strong regularity. For example, the "convolution neural network
algorithm" commonly used in the field of image recognition, one of the main operations is to
carry out a large number of multiplication operations for large matrixes. If we can deal with this
characteristic, we can naturally improve the speed of calculation.
Coincidentally, GPU is much better than CPU at this point. Slowly, people found that "GPU can
not only deal with the acceleration of image problems, most parallel computing problems can be
solved with GPU, including bitcoin mining. “Although compared with CPU, GPU is a special chip,
but today, GPU has become a general-purpose parallel computing chip. In this way, GPU
technology has a great driving force for development, is widely used in parallel computing, but
also promotes the development of all kinds of AI chips.
3. AI chip frontier.
Now, in addition to GPU, in order to better achieve artificial intelligence, there are new AI chips. Let me
introduce you to two of the most important ones.
As I just said, although GPU can speed up the AI algorithm more than CPU, GPU is now a general-
purpose parallel computing chip. Its problem is that it is not optimized for every artificial
intelligence problem, and the power consumption and price are relatively high, so we want "more
dedicated chips" to improve efficiency, and the emergence of FPGA is to solve this problem.
This is a very clever invention; you can think of it as a "universal chip”. After the chip is done, you
can modify the connection form of the devices in the chip as needed to form a variety of chips
with different functions. This process is called "FPGA burning”. For different artificial intelligence
problems, we can change the FPGA into a corresponding special chip to better adapt to the
problem!
For example, you can think of FPGA as "Flash express delivery”. Although "Flash delivery" is a
general service, anyone can use it, but once you place an order, this Flash delivery courier will
only serve you. He will establish an optimal path between you and your goal and deliver things
as quickly as possible. After the end of the service, he will take the next order and set up a new
route for new users. For Flash express delivery, "placing an order" corresponds to "programming
burning of FPGA". Once burned, the FPGA becomes your dedicated chip-efficiently solving your
specific problems. After completing the task, you can also change the circuit structure according
to the new problem and turn it into another dedicated chip to complete the new task efficiently.
Therefore, FPGA is a very important form of implementation in artificial intelligence. FPGA has
advantages in flexibility, but it also has weaknesses: generally speaking, its price is relatively high,
and in terms of performance, speed, power consumption and chip area, it also has a lot of room
for improvement. So people also thought of an ultimate method, which is also the hottest and
most cutting-edge field of chip technology, called "ASIC customized chip".
In other words, a chip is specially designed according to the AI problem to be solved, and it does
nothing else. Its advantages and disadvantages are very obvious, the advantage is "very efficient,
energy consumption will be very low". The disadvantage is that "there is a complete loss of
versatility, and once you design and make this chip, it can't do anything else." If you want it to
solve other problems in two days, there is no way, you have to make another chip.
But now the cost of chip design and processing of advanced technology is very high, so
customized AI chips cannot be done by any company, that is, big companies like Google and Ali
will do this only if they have clear application scenarios and algorithms. Now the most famous
custom AI chip is Google's TPU, called Tensor processor. According to Google's public data, TPU
can improve performance and reduce energy consumption by dozens of times and hundreds of
times compared to the best GPU.
Finally, I would like to add that there is actually an important development trend, that is,
"general-purpose AI chips". This "universal" does not mean to solve all computing problems like
CPU, but hopes that an AI chip can meet the requirements of low cost and versatility under the
premise of high efficiency and low power consumption, and solve all kinds of AI problems. This
will be a very promising direction for the future!
2. Apple
Apple, at their latest event ‘Time Flies’, introduced an all-new iPad Air that houses a powerful
A14 Bionic chip, a 5 nm chipset. This makes the iPad Air, the world’s first device to operate on a
5nm chip. “We’re excited to introduce Apple’s most powerful chip ever made, the A14 Bionic,”
said Greg Joswiak, Apple’s senior vice president of Worldwide Marketing. Traditionally, Apple
would launch new chipsets with iPhones. Instead, the A14 Bionic was announced alongside the
new iPad Air. In the last two events, Apple has been explicit about how serious they are about
machine learning-based SoC.
Apple has been developing its own chips for some years and could eventually stop using suppliers
such as Intel, which would be a huge shift in emphasis. But having already largely disentangled
itself from Qualcomm after a long legal wrangle, Apple does look determined to go its own way
in the AI future. The company has used its A11 and A12 “Bionic” chips in its latest iPhones and
iPads. The chip uses Apple’s Neural Engine, which a part of the circuitry that is not accessible to
third-party apps.
The A12 Bionic chip is said to be 15 percent faster than its previous incarnation, while using 50
percent of the power. The A13 version is in production now, according to Inverse, and is likely to
feature in more of the company’s mobile devices this year. And considering that Apple has sold
more than a billion mobile devices, that’s a heck of a ready-made market, even without its
desktop computer line, which still only accounts for only 5 percent of the overall PC market
worldwide.
3. Huawei
Huawei Technologies has
officially unleashed its artificial
intelligence (AI) chip Ascend
910, which it says has a
maximum power consumption
of just 310W -- this is lower than
its originally planned specs of
350W. The chip is touted as
having "more computing power
than any other AI processor",
delivering 256 teraflops at half-
precision floating point (FP16)
and 512 teraflops for integer
precision calculations. Figure 3 Huawei Artificial intelligence (AI) chip Ascend 910
The Chinese tech giant also announced the commercial availability of its Mind Spore AI
computing framework, which it said was designed to ease the development of AI applications
and improve the efficiencies of such tools. Huawei said the AI framework handled only gradient
and model data that already had been processed, so user privacy could be maintained.
The platform also had "built-in protection technology" to keep AI models secured. Mind Spore
supports various platforms including edge, cloud, and devices, and is touted to work on a design
concept that enables developers to more easily and quickly train their models. "In a typical neural
network for natural language processing (NLP), Mind Spore has 20% fewer lines of core code than
leading frameworks on the market, and it helps developers raise their efficiency by at least 50%,"
Huawei said.
4 Intel
Nvidia unwrapped its Nvidia A100 artificial intelligence chip, and CEO Jensen Huang called it the
ultimate instrument for advancing AI. Huang said it can make supercomputing tasks — which are
vital in the fight against COVID-19 — much more cost-efficient and powerful than today’s more
expensive systems.
The chip has a monstrous 54 billion transistors (the on-off switches that are the building blocks
of all things electronic), and it can execute 5 petaflops of performance, or about 20 times more
than the previous-generation chip Volta. Huang made the announcement during his keynote at
the Nvidia GTC event, which was digital this year.
In the market for GPUs, which we mentioned can process AI tasks much faster than all-purpose
chips, Nvidia looks to have a lead. Similarly, the company appears to have gained an advantage
in the nascent market for AI chips.
The two technologies would seem to be closely related to each other, with Nvidia’s advances in
GPUs helping to accelerate its AI chip development. In fact, GPUs appear to underpin Nvidia’s AI
offerings, and its chipsets could be described as AI accelerators.
The specific AI chip technologies Nvidia supplies to the market include its Tesla chipset, Volta,
and Xavier, among others. These chipsets, all based on GPUs, are packaged into software-plus-
hardware solutions that are aimed at specific markets. Xavier, for example, is the basis for an
autonomous driving solution, while Volta is aimed at data centers.
AI Chips types
AI chips include three classes: graphics processing units (GPUs), field programmable gate arrays
(FPGAs), and application-specific integrated circuits (ASICs).
You might use an FPGA when you need to optimize a chip for a particular workload, or when you
are likely to need to make changes at the chip level later on. Uses for FPGAs cover a wide range
of areas—from equipment for video and imaging, to circuitry for computer, auto, aerospace, and
military applications, in addition to electronics for specialized processing and more. FPGAs are
particularly useful for prototyping application-specific integrated circuits (ASICs) or processors.
An FPGA can be reprogrammed until the ASIC or processor design is final and bug-free and the
actual manufacturing of the final ASIC begins. Intel itself uses FPGAs to prototype new chips.
The New Frontier for FPGAs: Artificial Intelligence
Today, FPGAs are gaining prominence in another field: deep neural networks (DNNs) that are
used for artificial intelligence (AI). Running DNN inference models takes significant processing
power. Graphics processing units (GPUs) are often used to accelerate inference processing, but
in some cases, high-performance FPGAs might actually outperform GPUs in analyzing large
amounts of data for machine learning.
FPGA Architecture
The general FPGA architecture
consists of three types of modules.
They are I/O blocks or Pads, Switch
Matrix/ Interconnection Wires and
Configurable logic blocks (CLB). The
basic FPGA architecture has two
dimensional arrays of logic blocks with
a means for a user to arrange the
interconnection between the logic
blocks. The functions of an FPGA
architecture module are discussed
below: Figure 9 FPGA Architecture
• CLB (Configurable Logic Block) includes digital logic, inputs, outputs. It implements the
user logic.
• Interconnects provide direction between the logic blocks to implement the user logic.
• Depending on the logic, switch matrix provides switching between interconnects.
• I/O Pads used for the outside world to communicate with different applications.
The basic building block of the FPGA is the Look Up Table based function generator. The number
of inputs to the LUT vary from 3,4,6, and even 8 after experiments. Now, we have adaptive LUTs
that provides two outputs per single LUT with the implementation of two function generators.
Xilinx Virtex-5 is the most popular FPGA, that contains a Look up Table (LUT) which is connected
with MUX, and a flip flop as discussed above. Present FPGA consists of about hundreds or
thousands of configurable logic blocks. For configuring the FPGA, Modalism and Xilinx ISE
software’s are used to generate a bitstream file and for development.
3. Application-specific integrated circuits (ASICs)
ASIC engineers often make use of latches in their designs. As a general rule-of-thumb, if you are
designing an FPGA, and you are tempted to use a latch, don't! Flip-flops with both “Set” and
“Reset” Inputs Many ASIC libraries offer a wide range of flip-flops, including a selection that offer
both set and reset inputs (both synchronous and asynchronous versions are usually available).
By comparison, FPGA flip-flops can usually be configured with either a set input or a reset input.
In this case, implementing both set and reset inputs requires the use of a LUT, so FPGA design
engineers often try to work around this and come up with an alternative implementation.
Global Resets and Initial Conditions Every register in an FPGA is programmed with a default initial
condition (that is, to contain a logic 0 or a logic 1). Furthermore, the FPGA typically has a global
reset signal that will return all of the registers (but not the embedded RAMs) to their initial
conditions. ASIC designers typically don't implement anything equivalent to this capability.
Advantages of ASICs over FPGAs
ASICs have a number of advantages over FPGAs, depending on the system designer’s goals. ASICs,
for instance, permit fully custom capability for the system designer as the device is manufactured
to custom design specifications. Additionally, for very high-volume designs, an ASIC
implementation will have a significantly lower cost per unit. It is also likely that the ASIC will have
a smaller form factor since it is manufactured to custom design specifications. ASICs will also
benefit from higher potential clock speeds over their FPGA counterparts.
A corresponding FPGA implementation, on the other hand, will typically have a faster time to
market as there is no need for layout of masks and manufacturing steps. FPGAs will also benefit
from simpler design cycles over their ASIC counterparts, due to software development tools that
handle placement, routing, and timing restrictions. FPGAs also benefit from being
reprogrammable, in that a new bit stream can quickly be uploaded, during system development
as well as when deployed in the field. This is one large advantage over the ASIC counterparts.
The Value of State-of-the-Art AI Chips
Leading node AI chips are increasingly necessary for cost-effective, fast training and inference of
AI algorithms. This is because they exhibit efficiency and speed gains relative to state-of-the-art
CPUs and trailing node AI chips. And, as discussed in subsection A, efficiency translates into
overall cost-effectiveness in chip costs—which are the sum of chip production costs (i.e. design,
fabrication, assembly, test, and packaging costs). Finally, cost and speed bottleneck training and
inference of many compute-intensive AI algorithms, necessitating the most advanced AI chips for
AI developers and users to remain competitive in AI R&D and deployment.
Table 1: Comparing state-of-the-art AI chips to state-of-the-art CPUs
Thank you:
Best Regards: Mirza Mansab Baig
MS scholar in Xiamen university China