Mca 5

Uploaded by

Vivek Dubey

0% found this document useful (0 votes)

4 views34 pages

this is modern computer architecture

Original Title

mca 5

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

this is modern computer architecture

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

4 views34 pages

Mca 5

Uploaded by

Vivek Dubey

this is modern computer architecture

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 34

Search inside document

Department of Computer Science and Engineering

COURSE CODE – TITLE: 1021CS129 – MODERN

COMPUTER ARCHITECTURE
SUMMER SEMESTER 2023-24

Course Instructor
Dr. M. Rajeev Kumar
Professor (CSE)
UNIT V
Level of learning
CO
Course Outcome(s) domain (Based on
Nos.
revised Bloom’s)
Implement the CUDA Programming model for Parallel
CO5 K3
computing.

Unit 5 High Performance Computing with CUDA 9 Hours

CUDA programming model, Basic principles of CUDA programming, Concepts of threads and blocks, GPU and CPU data
exchange.
GPU and CPU data
exchange
Typical CUDA Program Flow
Step 1: int *data = (int *)
malloc (n * sizeof(int));
Step 2: cudaMalloc(&d_data,
n * sizeof(int));
cudaMemcpy(d_data, data, n
* sizeof(int),
cudaMemcpyHostToDevice);
Step 3: execute<<<… , …>>>
(d_data);
Step 4: cudaMemcpy(data,
d_data, n * sizeof(int),
cudaMemcpyDeviceToHost);
Types of Data Transfer in CUDA

• Pageable and Pinned

• Explicit and Implicit (UVM)
• Peer to Peer (between GPUs of the same host)
• GPUDirect (between GPU and network interface)
• Synchronous and asynchronous
Pageable and Pinned memory transfer
Pageable and Pinned memory transfer (Contd.,)
Pageable and Pinned memory transfer (Contd.,)
Pageable and Pinned memory transfer (Contd.,)
Pageable and Pinned memory transfer (Contd.,)
Wave13pt Kernal – Pageable Vs Pinned
Memory
Pageable and Pinned memory transfer (Contd.,)
Pageable and Pinned memory transfer
Summary
• Pageable memory – user memory space, requires extra mem-copy
• Pinned memory – kernel memory space
• Pinned memory performs better (higher bandwidth)
• Do not over-allocate pinned memory – reduces amount of physical memory available
for OS
Unified memory
Unified memory - Usage
Unified memory – Use Case
Unified memory – Use Case
Unified memory – Use Case
Unified memory – Use Case
Wave13pt Kernal – UVM
Wave13pt Kernal – Old memory mapping
system
Simplified memory transfers: UVM
• How does UVM performs in case of multi-threading?
 UVM implements CS – threads are serialized, performance degradation
UVM Summary

• Simplifies program model. But,

o Performance issue D -> H
o CS in multithreaded application
• What it could still be good for?
Peer to Peer data transfer Overview
Peer to Peer data transfer – Unified Virtual
Addressing
P2P memory transfer – Usage
P2P memory transfer – Summary

• P2P and UVA can be used to both simplify and accelerate CUDA programs
• One address space for all CPU and GPU memory
o Determine physical memory location from pointer value
o Simplified binary interface – cudaMemcopy()
• Faster memory copies between GPUs with less host overhead
GPU Direct – Overview
Asynchronous data transfer
• cudaMemcopy() is blocking – will not return until memcopy is complete
• cudaMemcpyAsync() is nonblocking – returns immediately, we can utilize CPU by useful
computation
• Asynchronous memcopy has two additional requirements:
o Pinned memory
o Stream id
Asynchronous data transfer (Contd.,)
• What is it good for?
o Overlap CPU and GPU communication
o Overlap computation and memory transfer
Asynchronous transfer
CUDA Streams
• What is it?
o Sequence of CUDAoperation that execute in issue-order on the GPU
• What is it good for?
o Operation from different streams may run concurrently
o Hide memory latencies and memory size limitations
CUDA Streams (Contd.,)
CUDA Streams Synchronization
Explicit:
• cudaDeviceSynchronize()
o Blocks until all CUDA operations are finished
• cudaStreamSynchronize(stream)
o Blocks until all CUDA operations are finished within given stream
• cudaEventRecord(event,stream1), cudaStreamWaitEvent(stream2,event)
o Fine-grained synchronization
Implicit:
• Page-locked memory allocation
o cudaMallocHost, cudaHostAlloc
• Device memory allocation
o cudaMalloc
• Blocking version of memory operation
o Cudamemcpy, cudamemset
• Implicit synchronize all CUDA operations

Wavelet Tree
Document29 pages
Wavelet Tree
Vaibhav Kaushik
No ratings yet
High Performance Computing On Gpu
Document37 pages
High Performance Computing On Gpu
Sushant Sharma
No ratings yet
JCUDA
Document13 pages
JCUDA
xzantelsa
No ratings yet
CUDA Compute Unified Device Architecture
Document26 pages
CUDA Compute Unified Device Architecture
proxymo1
No ratings yet
Cuda Review 1
Document13 pages
Cuda Review 1
JESSEMAN
No ratings yet
лк CUDA - 1 PDCn
Document31 pages
лк CUDA - 1 PDCn
Олеся Барковська
No ratings yet
A Beginner'S Guide To Programming Gpus With Cuda: Mike Peardon
Document21 pages
A Beginner'S Guide To Programming Gpus With Cuda: Mike Peardon
proxymo1
No ratings yet
CUDA Programming On Nvidia Gpus: Mike Giles
Document21 pages
CUDA Programming On Nvidia Gpus: Mike Giles
proxymo1
No ratings yet
Mics2010 Submission 13
Document12 pages
Mics2010 Submission 13
Materi Mikom
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
Document84 pages
Introduction To Programming Massively Parallel Graphics Processors
djrive
No ratings yet
GPU Quicksort
Document22 pages
GPU Quicksort
Kriti Jha
No ratings yet
4 - Key Concepts
Document2 pages
4 - Key Concepts
olia.92
No ratings yet
A Practical Approach of Curved Ray Prestack Kirchhoff Time Migration On GPGPU
Document12 pages
A Practical Approach of Curved Ray Prestack Kirchhoff Time Migration On GPGPU
vovizmus
No ratings yet
Nvidia Quadro Dual Copy Engines: White Paper
Document17 pages
Nvidia Quadro Dual Copy Engines: White Paper
Nan_Kingyj
No ratings yet
GPU Computing Course Overview
Document3 pages
GPU Computing Course Overview
Er Umesh Thoriya
No ratings yet
Introduction to CUDA and GPU Computing
Document15 pages
Introduction to CUDA and GPU Computing
Raghav Ganesh
No ratings yet
Research Article: Hybrid MPI and CUDA Parallelization For CFD Applications On Multi-GPU HPC Clusters
Document15 pages
Research Article: Hybrid MPI and CUDA Parallelization For CFD Applications On Multi-GPU HPC Clusters
zhaojie qin
No ratings yet
Accelerating CUDA Graph Algorithms at Maximum Warp
Document25 pages
Accelerating CUDA Graph Algorithms at Maximum Warp
thangmle
No ratings yet
CUDA Cuts Fast Graph Cuts On The GPU
Document8 pages
CUDA Cuts Fast Graph Cuts On The GPU
as.dcdvfd
No ratings yet
Hyaline framework distributes CUDA tasks
Document8 pages
Hyaline framework distributes CUDA tasks
yogesh
No ratings yet
Designing Efficient Sorting Algorithms For Manycore Gpus: Ntroduction
Document10 pages
Designing Efficient Sorting Algorithms For Manycore Gpus: Ntroduction
aruishawg
No ratings yet
A Practical GPU Based KNN Algorithm: Quansheng Kuang, and Lei Zhao
Document5 pages
A Practical GPU Based KNN Algorithm: Quansheng Kuang, and Lei Zhao
Derry Fajirwan
No ratings yet
Case Study On: Nitte Meenakshi Institute of Technology
Document8 pages
Case Study On: Nitte Meenakshi Institute of Technology
yogesh
No ratings yet
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
Document28 pages
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
Pallavi Bharti
No ratings yet
OPENMP Notes
Document4 pages
OPENMP Notes
avinash kumar
No ratings yet
A Performance Study of Applying CUDA-Enabled GPU in Polar Hough Transform For Lines
Document4 pages
A Performance Study of Applying CUDA-Enabled GPU in Polar Hough Transform For Lines
Journal of Computing
No ratings yet
Nvidia Cuda Thesis
Document8 pages
Nvidia Cuda Thesis
andreajonessavannah
100% (2)
Image Rotation Using CUDA
Document18 pages
Image Rotation Using CUDA
Harmit Singh
No ratings yet
Parallel & Distributed Computing Report
Document4 pages
Parallel & Distributed Computing Report
Umer Ali
No ratings yet
Large Graph Algorithms on Massively Multithreaded GPU Architectures
Document26 pages
Large Graph Algorithms on Massively Multithreaded GPU Architectures
gorot1
No ratings yet
Comp Arch Project 2 Final
Document29 pages
Comp Arch Project 2 Final
Archit
No ratings yet
HPC_4_B
Document5 pages
HPC_4_B
Nayan Jadhav
No ratings yet
Cuda Talk
Document82 pages
Cuda Talk
Kevin Salmeron Vicente
100% (1)
Data Parallel Computation
Document9 pages
Data Parallel Computation
Abdul Rehman Buttsarkar
No ratings yet
Improving The Performance of A Ray Tracing Algorithm Using A GPU
Document10 pages
Improving The Performance of A Ray Tracing Algorithm Using A GPU
Ťüšhâř Ķêšãvâñ
No ratings yet
Analyzing CUDA Workloads Using A Detailed GPU Simulator
Document12 pages
Analyzing CUDA Workloads Using A Detailed GPU Simulator
Jai Menon
No ratings yet
Side-Channel Power Analysis of a GPU AES Implementation (SCPA GPU AES
Document8 pages
Side-Channel Power Analysis of a GPU AES Implementation (SCPA GPU AES
Anupam Das
No ratings yet
Barnett Haskins
Document29 pages
Barnett Haskins
Cristi Alexandru Vasile
No ratings yet
Neural Network Implementation Using CUDA and OpenMP
Document7 pages
Neural Network Implementation Using CUDA and OpenMP
Winicius Fidelis
No ratings yet
A LBM Solver 3D Fluid Simulation On GPU
Document9 pages
A LBM Solver 3D Fluid Simulation On GPU
Zhe Li
No ratings yet
Dma Controller Thesis
Document6 pages
Dma Controller Thesis
BuyCollegePapersOnlineHuntsville
100% (2)
Programming Gpus With Cuda: John Mellor-Crummey
Document42 pages
Programming Gpus With Cuda: John Mellor-Crummey
askbilladdmicrosoft
No ratings yet
Multi Threading
Document168 pages
Multi Threading
Prashanth Rajendran
No ratings yet
1 Cuda
Document173 pages
1 Cuda
Diego Canales Aguilera
No ratings yet
Accelerating Large Graph Algorithms On The GPU Using Cuda
Document12 pages
Accelerating Large Graph Algorithms On The GPU Using Cuda
abc010
No ratings yet
Accelerating Graph Algorithms on the GPU
Document12 pages
Accelerating Graph Algorithms on the GPU
Maks Mržek
No ratings yet
Intro 2 Cuda
Document30 pages
Intro 2 Cuda
ab c
No ratings yet
Lec 2,3,4
Document48 pages
Lec 2,3,4
Yatharth Anand
No ratings yet
Finite-Element Seismic Wave Modeling on GPU Cluster
Document23 pages
Finite-Element Seismic Wave Modeling on GPU Cluster
Jun Yan
No ratings yet
Gpu Programming
Document96 pages
Gpu Programming
Jino Goju Stark
100% (2)
Ijisa V11 N5 3
Document9 pages
Ijisa V11 N5 3
Rajesh
No ratings yet
Parallel Computing Platforms and Memory Performance
Document43 pages
Parallel Computing Platforms and Memory Performance
askbilladdmicrosoft
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
Document47 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
Neha
No ratings yet
Parallel Computing in CFD for Faster Simulations
Document17 pages
Parallel Computing in CFD for Faster Simulations
gem_bhavesh
No ratings yet
Nvidia Cuda
Document26 pages
Nvidia Cuda
Arpit Vijayvergia
No ratings yet
Multi Gpu Programming With Mpi
Document93 pages
Multi Gpu Programming With Mpi
Osama
No ratings yet
Accelerating MATLAB with CUDA FFTs
Document17 pages
Accelerating MATLAB with CUDA FFTs
keoxx
No ratings yet
Parallel Data Mining Techniques On Graph
Document26 pages
Parallel Data Mining Techniques On Graph
badboydhbk
No ratings yet
Gpgpu Final
Document124 pages
Gpgpu Final
Sibghat Rehman
No ratings yet
CUDA Programming: A Developer's Guide to Parallel Computing with GPUs
From Everand
CUDA Programming: A Developer's Guide to Parallel Computing with GPUs
Shane Cook
Rating: 4 out of 5 stars
4/5 (1)
Mca 4
Document61 pages
Mca 4
Vivek Dubey
No ratings yet
Mca - Unit Iii
Document59 pages
Mca - Unit Iii
Vivek Dubey
No ratings yet
UNIT II - Multi Core Architecture
Document102 pages
UNIT II - Multi Core Architecture
Vivek Dubey
No ratings yet
Mca 4
Document61 pages
Mca 4
Vivek Dubey
No ratings yet
Network Simulator2 (NS2) Basic Commands
Document4 pages
Network Simulator2 (NS2) Basic Commands
hariharanpotter
100% (7)
Adcnas001 Users Dell HTML Training Document Uploads Docs PowerEdge
Document116 pages
Adcnas001 Users Dell HTML Training Document Uploads Docs PowerEdge
hulbertc
No ratings yet
Cs601 Assignment 2 Helping Material
Document98 pages
Cs601 Assignment 2 Helping Material
cs619finalproject.com
No ratings yet
04 Quiz-Answer
Document2 pages
04 Quiz-Answer
Marc Gonzales
No ratings yet
Keys in DBMS explained with types and SQL examples
Document14 pages
Keys in DBMS explained with types and SQL examples
Joy Pal
No ratings yet
Polynomial Addition & Doubly Linked List
Document8 pages
Polynomial Addition & Doubly Linked List
SHIVALKAR J
No ratings yet
Embedded Systems: Department of Electrical and Computer Engineering MWU University
Document32 pages
Embedded Systems: Department of Electrical and Computer Engineering MWU University
Samuel Adamu
No ratings yet
Oracle Financials For India (OFI) GST-TDS Handling Post GST Patch
Document11 pages
Oracle Financials For India (OFI) GST-TDS Handling Post GST Patch
satish1981
No ratings yet
Json Developers Guide PDF
Document196 pages
Json Developers Guide PDF
Md Shadab Ashraf
100% (1)
Ansi SQLL Mcqs
Document72 pages
Ansi SQLL Mcqs
Sourav Sarkar
No ratings yet
SCD Typ2 in Databricks Azure
Document8 pages
SCD Typ2 in Databricks Azure
sayhi2sudarshan
0% (1)
MDX Introduction and Overview
Document3 pages
MDX Introduction and Overview
Sachitra Khatua
No ratings yet
8086 - Addressing Modes
Document18 pages
8086 - Addressing Modes
Shivam Choudhary
No ratings yet
GDG Notes
Document8 pages
GDG Notes
sathyasiddhu4
No ratings yet
CCTV, ACC CTRL Maintenance Contract
Document6 pages
CCTV, ACC CTRL Maintenance Contract
Georges Gemayel
No ratings yet
Varonis 1 2ME5EK5
Document15 pages
Varonis 1 2ME5EK5
Sekti Wicaksono
No ratings yet
Clients Class Workbook v5.1 CBT
Document192 pages
Clients Class Workbook v5.1 CBT
mich0p
No ratings yet
CIO Basic Operators and Data Types
Document44 pages
CIO Basic Operators and Data Types
Ted Chan
No ratings yet
QASW2
Document3 pages
QASW2
praveen.aicp
No ratings yet
(WWW - Entrance-Exam - Net) - Sasken Communication Technologies Placement Sample Paper 3
Document5 pages
(WWW - Entrance-Exam - Net) - Sasken Communication Technologies Placement Sample Paper 3
Kouts Shukla
No ratings yet
T50e Replace Drive
Document24 pages
T50e Replace Drive
vcalderonv
No ratings yet
Android system log records role and app preference changes
Document263 pages
Android system log records role and app preference changes
Ivan Martinez
No ratings yet
PDF Guy Peters El Nuevo Institucionalismo DL
Document9 pages
PDF Guy Peters El Nuevo Institucionalismo DL
charlyedu84
No ratings yet
Db2top - DB2 Monitoring Tool Command: Scope
Document6 pages
Db2top - DB2 Monitoring Tool Command: Scope
sekharpbscpg
No ratings yet
Zilog Z-80 Product Specifications
Document10 pages
Zilog Z-80 Product Specifications
TheAnonymousLugia
No ratings yet
Aws Dumps and Study Material
Document518 pages
Aws Dumps and Study Material
A Gentlemen
No ratings yet
Xmodem Console Download Procedure Using Rommon
Document3 pages
Xmodem Console Download Procedure Using Rommon
Soeng Sengheang
No ratings yet
Install Windows 10 Apps To An External Hard Disk
Document8 pages
Install Windows 10 Apps To An External Hard Disk
Nimish Madanan
No ratings yet
Lesson - 05 - Configure and Manage Virtual Networking
Document88 pages
Lesson - 05 - Configure and Manage Virtual Networking
Bhushan Laddad
No ratings yet
Entity Relationship Diagram - ER Diagram
Document6 pages
Entity Relationship Diagram - ER Diagram
Mayank Goswami
No ratings yet