- sy
- 189855050-Ab-Initio-Training-Part-2.ppt
- p72-rolfe
- How to Build Optional Parallel Processing into Your Applications for Increased Throughput and Reduced Processing.pdf
- Minimax Traverse Gpu
- NVIDIA_OpenCL_BestPracticesGuide
- MTECH 4
- On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applications
- Computer architecture first page
- 555 Fpga Implementation of Parallel Histogram Computation
- 2.2 - Cube Analyst II
- NEW COMPUTATIONAL PHYSICS APPLICATIONS
- UweSchwiegelsohnCentricMetric2011
- Compass Data Sheet 1-4
- Is Parallel Programming Hard
- Copy of Parallel Graph Algorithms
- LinCongrntRndmNmbrGen
- Parallel Processing
- PPA_L5
- GPU_PPT
- Mapred Tutorial
- Understanding Logical Processors _ Directions on Microsoft.pdf
- 148
- Enhancing With Multi Threading
- Topological and Primitive Impedance
- LabVIEW un'unica piattaforma, diverse tipologie di target
- Matrix
- Scc Project Success Plan
- c2-11
- 7100_prodbrief
- QĐ 349 Vv ban hành Quy định chế độ công tác giảng viên(1)
- Fp Growth Presentation v1 (Handout)
- Don Xin Chuyen Don Vi Cong Tac
- Chuong 3_Luat Ket Hop Mo
- 20TT
- vldb94apriori
- TDG_30_11
- Fp Growth Presentation v1 (Handout)
- thongbaocamthiMOS
- 18-2012TT
- Android Chapter02 Setup2 Emulator
- SunSte08A
- Bai Tap CSDL_Chuong 7-8_NKhanh
- Product Data Sheet0900aecd802930c5
- IIG Training
- Datamining_BTTH3
- Bai Tap Lap Trinh Can Ban
- MauDeCuongChiTiet_TCCN
- BaoCao_CUDA_25102011
- 2011 1yr PUMS Subjects
- so tay ppgd dg
- 8.Mau NCS 5 - De cuong NC
- lecture20_LoadBalancing_jwd13.ppt
- Cach chia subnet
- 4-preprocess
- Giao Trinh Ly Thuyet Thong Tin
- Phan 2 - Mang
- Thời của Dữ liệu Lớn
- Bai Tap Linux

Spring 2013

during RRR week) – Final report writeups due Monday May 13 at midnight 3/21/13 CS267 Class Projects 2 . from many departments • Need to do one or more of design / program / measure some parallel application / kernel / software tool / hardware • Can work alone or in teams – HW0 posted to help identify possible teammates based on interest • What you need to do – Project proposal by early during spring break – Feedback from instructor over spring break (ongoing conversations) – Poster presentation (+ recording short video presentation) on Thursday May 9 (class time.Class project suggestions • Many kinds of projects – Reflects broad scope of field and of students.

How to Organize A Project Proposal (1/2) • Parallelizing/comparing implementations of an Application • Parallelizing/comparing implementations of a Kernel • Building /evaluating a parallel software tool • Evaluating parallel hardware 3/21/13 CS267 Class Projects 3 .

compare to? – Don’t reinvent wheels. or at least compare to existing wheels to see if yours is better – For applications. in meantime see Lecture 21 from CS267 Spr12 – For applications. identify computational and structural patterns you plan to use • What are your success metrics – Get application X up on Hopper. solve problem Y – Get motif Z to run W times faster on GPU – Collect data V to evaluate/compare approaches 3/21/13 CS267 Class Projects 4 . consider using frameworks like Chombo or PETSC • Lecture later.How to Organize A Project Proposal (2/2) • What is the list of tasks you will try? – Sorted from low-hanging fruit to harder • What existing tools you will use.

applied to Alzheimer’s Disease • Better speech recognition through a faster “inference engine” • Faster algorithms to tolerate errors in new genome sequencers • Faster simulation of marine zooplankton population • Sharing cell-phone bandwidth for faster transfers 3/21/13 CS267 Class Projects 5 .cs.A few sample CS267 Class Projects all posters and video presentations at www.edu/~demmel/cs267_Spr09/posters.html • Content based image recognition – “Find me other pictures of the person in this picture” • Faster molecular dynamics.berkeley.

2. Sparse-Matrix-Vector-Multiplication on GPUs 8. 4. 3. Parallel RI-MP2 3/21/13 CS267 Class Projects 6 . High-Throughput.More Prior Projects 1. 5. Accurate Image Contour Detection CUDA-based rendering of 3D Minkowski Sums Parallel Particle Filters Scaling Content Based Image Retrieval Systems Towards a parallel implementation of the Growing String Method 6. Optimization of the Poisson Operator in CHOMBO 7.

Finite Element Simulation of Nonlinear Elastic Dynamics using CUDA 3/21/13 CS267 Class Projects 7 . An AggreGATE Network Abstraction for Mobile Devices 7.More Prior Projects 1. Replica Exchange Molecular Dynamics (REMD) for Amber's ParticleMesh Ewald MD (PMEMD) 3. Clustering overlapping reads without using a reference genome 6. Parallel implementation of multipole-based Poisson-Boltzmann solver 8. Using exponential integrators to solve large stiff problem 5. Creating a Scalable HMM based Inference Engine for Large Vocabulary Continuous Speech Recognition 4. Parallel FFTs in 3D: Testing different implementation schemes 2.

Parallel POPCycle Implementation . 4.Still more prior projects 1. Really? Profiling Chombo Finite Methods Solver and Parsec Fluids Codes on Nehalem and SiCortex 8. Parallel Groebner Basis Computation using GASNet Accelerating Mesoscale Molecular Simulation using CUDA and MPI Modeling and simulation of red blood cell light scattering NURBS Evaluation and Rendering Performance Variability in Hadoop's Map Reduce Utilizing Multiple Virtual Machines in Legacy Desktop Applications How Useful are Performance Counters. 7. Symmetric Eigenvalue Problem: Reduction to Tridiagonal 8 10. 6. Energy Efficiency of MapReduce 9. 3. 5. 2.

PREVIOUS PROJECT SUGGESTIONS 3/22/12 CS267 Class Projects 9 .

” eg for dense linear algebra – LAPACK – all parallelism in BLAS – ScaLAPACK – distributed memory using MPI – PLASMA – DAG scheduling on multicore • Parallel Linear Algebra for Scalable Multi-core Architectures • http://icl.utexas.edu/magma/ – Cloud – FLAME .Class Project Suggestions (1/7) • Pick one (of many) functions from one of the 13 motifs • Pick a target parallel platform • Pick a “parallel programming framework.edu/wiki/flame.utk.cs. implement.wiki/FrontPage • Design.cs.cs. measure.edu/plasma/ – MAGMA – DAG scheduling for heterogeneous platforms • Matrix Algebra on GPU and Multicore Architectures • http://icl. model and/or compare performance – Can be missing entirely on target platform 10 – May exist.utk.http://z. but with a different programming framework .

380 Soda) °Experiment with SpMV on different architectures • Which optimizations are most effective? °Try to speed up particular matrices of interest • Data mining.Ax. preconditioning °Experiment with new frameworks (SPF) 3/22/12 CS267 Class Projects 11 . “bottom solver” from AMR °Explore tuning space [x.Akx] kernel • Different matrix representations (last slide) • New Krylov subspace methods.Class Project Suggestions (2/7) °Many new algorithmic ideas for sparse linear algebra °Come to BEBOP meetings (T 12:30 – 2.….

insert TSLU 3/22/12 CS267 Class Projects 12 .Class Project Suggestions (3/7) • Proposed by Sherry Li. potential for speedup by using “Tall-Skinny LU” (TSLU – Project (2): implement. LBL Staff Scientist • “Feasibility of Communication-Avoiding Panel Factorization in Sparse LU” • Based on SuperLU – widely used parallel sparse LU factorization routine – Bottleneck: factorization of “small” panel at each step – Project (1): instrument code to evaluate potential communication bottleneck.

ParLab postdoc • “Automatic parallelization of BFS/DFS algorithms using SEJITS” • Motivated by common pattern in several optimal D&C algorithms for matrix multiplication – Traverse D&C tree by BFS until out of memory for replicating data. apply to various algorithms 3/22/12 CS267 Class Projects 13 .Class Project Suggestions (4/7) • Proposed by Oded Schwartz. then switch to DFS • Use SEJITS to parallelize Python code exploiting this pattern.

Class Project Suggestions (5/7) • Proposed by Oded Schwartz. ParLab postdoc • Variety of fast linear algebra algorithms to be paralleized – Yuster-Zwick algorithm for sparse matrix multiplication – Variety of fast algorithms beyond Strassen: which ones are fast? Parallelizable as well as Strassen? 3/22/12 CS267 Class Projects 14 .

and would like to collaborate on implementing. ParLab grad student • “Communication and arithmetic optimal long integer arithmetic” • Derrick found an implementation of the Shonhage-Strassen integer multiply algorithm that minimizes communication. measuring it 3/22/12 CS267 Class Projects 15 .Class Project Suggestions (6/7) • Proposed by Derrick Coetzee.

Class Project Suggestions (7/7) • “Minimizing the energy of a computation” • Energy required to perform computation (on a handheld device or supercomputer) is becoming the bottleneck • Communication (moving data) takes much more energy than arithmetic • How well do our communication-avoiding algorithms minimize energy? 3/22/12 CS267 Class Projects 16 .

- syUploaded byZander Alex
- 189855050-Ab-Initio-Training-Part-2.pptUploaded byabhinavak
- p72-rolfeUploaded byChagas Ponte
- How to Build Optional Parallel Processing into Your Applications for Increased Throughput and Reduced Processing.pdfUploaded bywen
- Minimax Traverse GpuUploaded byAlexey Shpitalyov
- NVIDIA_OpenCL_BestPracticesGuideUploaded byadjc387
- MTECH 4Uploaded byArun Pratap Singh
- On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data ApplicationsUploaded bydbpublications
- Computer architecture first pageUploaded byacabgth
- 555 Fpga Implementation of Parallel Histogram ComputationUploaded byAx Cool
- 2.2 - Cube Analyst IIUploaded bycitilabs
- NEW COMPUTATIONAL PHYSICS APPLICATIONSUploaded byPradeep Kumar Tamma
- UweSchwiegelsohnCentricMetric2011Uploaded bysfofoby
- Compass Data Sheet 1-4Uploaded byJoseph
- Is Parallel Programming HardUploaded byHinduja Kumar
- Copy of Parallel Graph AlgorithmsUploaded bylipika008
- LinCongrntRndmNmbrGenUploaded byfifster
- Parallel ProcessingUploaded byaymanwahba
- PPA_L5Uploaded byParth Shah
- GPU_PPTUploaded byChauhan Sriyam
- Mapred TutorialUploaded byBiswamohan Padhy
- Understanding Logical Processors _ Directions on Microsoft.pdfUploaded byManjunath Bheemappa
- 148Uploaded byRaghunandan
- Enhancing With Multi ThreadingUploaded byra_po_1
- Topological and Primitive ImpedanceUploaded byrasim_m1146
- LabVIEW un'unica piattaforma, diverse tipologie di targetUploaded byNational Instruments Italy
- MatrixUploaded byZakky Indra Nugroho
- Scc Project Success PlanUploaded byArup Sarkar
- c2-11Uploaded byVinay Gupta
- 7100_prodbriefUploaded bycondparini_ar9874

- QĐ 349 Vv ban hành Quy định chế độ công tác giảng viên(1)Uploaded byBao Huynh
- Fp Growth Presentation v1 (Handout)Uploaded byBao Huynh
- Don Xin Chuyen Don Vi Cong TacUploaded byBao Huynh
- Chuong 3_Luat Ket Hop MoUploaded byBao Huynh
- 20TTUploaded byBao Huynh
- vldb94aprioriUploaded byBao Huynh
- TDG_30_11Uploaded byBao Huynh
- Fp Growth Presentation v1 (Handout)Uploaded byBao Huynh
- thongbaocamthiMOSUploaded byBao Huynh
- 18-2012TTUploaded byBao Huynh
- Android Chapter02 Setup2 EmulatorUploaded byBao Huynh
- SunSte08AUploaded byBao Huynh
- Bai Tap CSDL_Chuong 7-8_NKhanhUploaded byBao Huynh
- Product Data Sheet0900aecd802930c5Uploaded byBao Huynh
- IIG TrainingUploaded byBao Huynh
- Datamining_BTTH3Uploaded byBao Huynh
- Bai Tap Lap Trinh Can BanUploaded byBao Huynh
- MauDeCuongChiTiet_TCCNUploaded byBao Huynh
- BaoCao_CUDA_25102011Uploaded byBao Huynh
- 2011 1yr PUMS SubjectsUploaded byBao Huynh
- so tay ppgd dgUploaded byBao Huynh
- 8.Mau NCS 5 - De cuong NCUploaded byBao Huynh
- lecture20_LoadBalancing_jwd13.pptUploaded byBao Huynh
- Cach chia subnetUploaded byBao Huynh
- 4-preprocessUploaded byBao Huynh
- Giao Trinh Ly Thuyet Thong TinUploaded byBao Huynh
- Phan 2 - MangUploaded byBao Huynh
- Thời của Dữ liệu LớnUploaded byBao Huynh
- Bai Tap LinuxUploaded byBao Huynh