You are on page 1of 1

Literatur for practical parts: Programming massively parallel processors 3rd

edition - David Kirk and Wen-mei Hwu

About the subject: Very interesting, can be a bit tough if one does not know C++,
but very manageable and the tutors are very helpful. Doing every exercise is
recommended.
About the exam itself: Very relaxed.
Grade: 1.0 without bonus. (If you reach 60% of the assignments, there is a 0.6
bonus).

- Describe the memory sections of GPU.


+ Global, texture, constant memory as segments editable by host.
+ Shared memory, L1, L2 and registers explained.
+ Explained special properties of texture memory.
+ Constant memory data can be cached aggresively in L1/L2.

- Describe how to construct histogram on 1000000 values of range [0, 255].


+ Building smaller histograms on a block thread.
+ Construct as many block threads as needed.
+ Apply reduction on smaller histograms to obtain the final histogram.
+ Question: Where and why should we use atomics?

- Show a piece of code, explain where __syncthreads(); should be placed.


+ Explained WAR and RAW hazards.
+ Explained that only sections that write into shared memory will need
synchronization. Read accesses do not need synchronization.

- Explain how bitonic sort works and work on an example.


+ Very important to know that bitonic merge requires two input sequences to be
bitonic but bitonic sort itself does not require the two input sequences to be
bitonic.
+ Given any random input array, we can obtain two bitonic sequences if we sort the
two halves of the array recursively with bitonic sort and then apply bitonic merge.
+ Parallel to merge sort.
+ Explain the running time complexity of bitonic sort.

- Explain how prefix-sum works on an example.

- Explain what can be done about PDEs.


+ Explain where massively parallel computing can be used to solve PDEs. (Solving
PDEs numerically by solving linear equation systems. Restriction and prolongation
steps of multigrid ransfer are embarrassingly parallel problems. In Gauss Seidel,
each element of the solution vector can be solved parallel.)
+ Explain some other numerics problems where I admit that I don't know, Prof Lensch
has no problem with me not knowing it.

- Explain mixed-precision method.


+ Where mixed-precision can be used in Gauss Seidel?
+ Computation cost.

- Explain some theory of massively parallel computing.


+ Parallel work, time, cost.
+ Work efficiency, work, time optimality.
+ Gustafson Law, Amdahl Law.

- Explain sparse matrix presentation.


+ Prof Lensch showed one sparse matrix and asks which method I would use. I choose
ELL because each row has only at most 5 elements.
+ Another matrix, which is dense. I say no compression needed.

You might also like