Professional Documents
Culture Documents
CPU: --The CPU (Central Processing Unit) or the main processor executes
computing instructions. Attached to the motherboard via a CPU socket.
The CPU listens for input from a computer program or a peripheral such as a
keyboard, mouse, or touchpad.
GPU: --The GPU (Graphics Processing Unit) is a specialized graphics processor
designed to be able to process thousands of operations simultaneously.
Demanding 3D applications require parallel texture, mash, and light processing
to keep images moving smoothly across the screen, and the CPU architecture is
not optimized for those tasks.
CUDA:-
CUDA is a programming language that uses the Graphical Processing Unit
(GPU). It is a parallel computing platform and an API (Application
Programming Interface) model, Compute Unified Device Architecture was
developed by Nvidia. This allows computations to be performed in parallel
while providing well-formed speed.
CUDA stands for Compute Unified Device Architecture. It is a
package(libraries) developed by Nvidia, specially for graphics
processing.
• These libraries can be used with C or C++.
• Cuda is specific to Nvidia; this means you can not use Cuda for
graphics cards(GPUs) of other companies like AMD or Intel.
Cuda components:---
Driver
low-level software that controls the graphics card
Toolkit (currently on version 12.3)
nvcc CUDA compiler
Nsight plugin for Eclipse or Visual Studio
profiling and debugging tools
lots of libraries
CUDA programming:::-
host code on the CPU which interfaces to the GPU
kernel code which runs on the GPU
CUDA Library:
cuBLAS, cuBLASxt, cuFFT, cuTENSOR, cuSPARSE, cuRAND, cuSOLVER, CUB, NCCL, cuDNN, nvGraph,
NPP
Local Memory
Each SP uses Local memory. All variables declared in a
kernel(a function to be executed on GPU) are saved into Local
memory.
• Registers
Kernel may consist of several expressions. During
execution of an expression, values are saved into the Registers of
SP.
• Global Memory
It is the main memory of GPU. Whenever a memory
from GPU is allocated for variables by using Cuda Malloc() function,
by default, it uses global memory.
• Shared Memory
On one SP, one or more threads can be run. A
collection of threads is called a Block. On one SM, one or more
blocks can be run. Advantage of Shared memory is, it is shared
by all the threads in one block.
CUDA –X – AI libraries
Math Libraries
Image and Video Libraries
Deep Learning
Parallel Algorithms
Communication Libraries
Partner Libraries
ASIC:--
Full Custom ASICs: Designed from scratch for a specific application without
using pre-designed components.
Semi-Custom ASICs: Based on pre-designed and pre-verified blocks known as
standard cells and gate arrays. They offer a balance between customization and
development time and cost.
Programmable ASICs: Include Field-Programmable Gate Arrays (FPGAs) that
can be programmed by the customer or designer after manufacturing.
Advantages of ASICs
Disadvantages of ASICs
Development Cost and Time: High initial development cost and time due to
the custom design process.
Inflexibility: Once manufactured, an ASIC's functionality cannot be changed or
updated easily.
Risk: Higher risk due to the upfront investment and the inability to modify the
circuit once produced.
RAPID AI:--
Key Components:
Automated Machine Learning (Auto ML): Tools that automate the process of
applying machine learning to real-world problems.
Pre-trained Models: AI models developed on large datasets available for
specific tasks like image recognition, natural language processing, etc.
Cloud Computing: Cloud services provide the necessary computational power
and storage, facilitating the rapid deployment of AI models.
Model Repositories: Publicly or privately hosted libraries of AI models that
can be reused and adapted for new applications.
AI Platforms: Platforms like Azure AI, Google AI, and IBM Watson offer
integrated tools and services for developing AI solutions.
Benefits:
Applications:
Thunder SVM
Support Vector Machines (SVMs) are classic supervised learning models for
classification, regression and distribution estimation.
SVM training and prediction are very expensive computationally for large and
complex problems.
Code:-
from thundersvm import SVC
clf = SVC()
clf.fit(x, y)