# Homework 1

CDA 5155: Spring 2012 Due Date: 02/08/2012 11:55 PM (EDGE Students: 02/11/2012 11:55 PM) Total: 20 points (5% of overall score)
You are not allowed to take or give help in completing this assignment. Submit the PDF version of the submission in e-Learning (Sakai) website before the deadline. Please include the sentence in bold on top of your submission (PDF): “I have neither given nor received any unauthorized aid on this assignment”. 1. Assume that you are the product manager for XXX processor. The chip has an area of 263 mm2, with a defect rate of 0.025 defects per cm2 and N=11.5. The die of each chip is occupied by four identical cores (70% total area) and a shared L3 cache (30% total area). For simplicity, we assumed here that each chip has only four cores and an L3 cache (no other components). a. [1 Point] What is the yield of the die? b. [1 Point] Some researchers proposed that the number of defects in a die can be modeled by Geometric distribution. Suppose we can use the yield as the probability that there is no defect on a die, what is the value of parameter p in Geometric distribution here? Note: Geometric distribution means the probability that there are exactly k defects (k being a non-negative integer, k = 0, 1, 2, ...) is equal to

where, k is the number of occurrences of defects p is a positive real number, c. [3 Points] In a defected chip, assume defects are independent and uniformly distributed within the die area. What is the probability that all defects in a DEFECTED die occur in the same core? (In other words, there is no defect on all other three cores and the L3 cache.) Please notice that there can be more than one detect on a die. d. [1 Point] If there is only one defected core in a chip with defect-free L3, we can still sell it by shutting down the defected core. Suppose you can sell the perfectly working (defect-free) chip for \$259.99 each. Also assume that you need \$179 to manufacture and test each chip. What is the minimum sale price for your chips with 3 working cores (the defective core is shutdown) to make break even (no profit, no loss)?

For simplicity. the computation results. at most 5MB/1MB data can be moved to CPU/GPU cache before any computation. we only consider floating. b. output data. while each GPU core can deliver 3. CPU and GPU cache are 5MB and 1 MB. Your job is to develop a numerical simulation program. GPU Core 512 . There is no overlap between data transfer and computation. The software will be used on a workstation with a single CPU core and 512 GPU cores. The results must be transferred back to memory before the next round of computation. the computation cannot be performed without data. unless there are any specific restrictions. you have to load the required input data into the cache within CPU or GPU cores.point operations in this problem. a. Before each round of computation. Each byte input data requires 2 Flops to produce 0. Assume that the memory have infinite capacity. . No data can be transferred during the computation. [2 Points] If all of the dynamic instructions in your main application are parallelizable. if any. The output data must be written back to the memory before next round of computation. . So you accepted an offer from a software division in a GPU company. In other words. Suppose all input data is stored within the memory prior to execution. Similarly. will be stored in the corresponding cache immediately after the computation. while the bandwidth between GPU cache and memory is 36GB/s.2. One day you got tired with your processor company. what is the maximum performance (Flops) you can get from your hardware in the optimal situation? What is the speedup compared with CPU only execution? State your assumptions. i. The CPU can achieve 1GFlops. [3 Points] In reality.e. There is no dependency among different parts of data. respectively.9GFlops (peak). What is the maximum performance (Flops) you can get from your hardware in the optimal situation if we take the data transfer time into consideration? What is the speedup in this case compared to CPU-only execution? What happens to Flops and speedup if the GPU cache is 20MB? GPU Core 1 CPU Core CPU Cache Memory GPU Cache .. The CPU and all the GPU cores can perform calculation simultaneously.4 byte output data on average. The bandwidth between CPU cache and memory is 6GB/s.