Abstract

PISTON is a portable framework which supports the development of visualization and analysis operators using a platform-independent, data-parallel programming model. Operators such as isosurface, cut-surface and threshold have been implemented in this framework, with the exact same operator code achieving good parallel performance on different architectures. An important analysis operator in cosmology is the halo finder. A halo is a cluster of particles and is considered a common feature of interest found in cosmology data. As the number of cosmological simulations carried out in the recent past has increased, the resultant data of these simulations and the required analysis tasks have increased as well. As a consequence, there is a need to develop scalable and efficient tools to carry out the needed analysis. Therefore, we are currently implementing a halo finder operator using PISTON. Researchers have developed a wide variety of techniques to identify halos in raw particle data. The most basic algorithm is the friend-of-friends (FOF) halo finder, where the particles are clustered based on two parameters: linking length and halo size. In a FOF halo finder, all particles which lie within the linking length are considered as one halo and the halos are filtered based on the halo size parameter. A naive implementation of a FOF halo finder compares each and every particle pair, requiring O(n2) operations. Our dataparallel halo finder operator uses a balanced k-d tree to reduce this number of operations in the average case, and implements the algorithm using only the data-parallel primitives in order to achieve portability and performance.

Data-Parallel Halo Finder Operator in PISTON
Wathsala Widanagamaachchi (CCS-7) University of Utah Mentor : Christopher Sewell

Outline ● ● ● ● ● PISTON & motivation behind it Data-Parallel programming Halos & Halo finder Naive approach & Data-parallel approach Results .

What is PISTON? ● ● Portable framework Development of visualization & analysis operators Use a platform-independent. data-parallel programming model Motivation Lack of visualization software which take full advantage of acceleration hardware and multi-core architecture ● ● .

Data-Parallel programming & Thrust ● What is data parallelism? ● Same operation is performed by different processors on different pieces of data Thrust is a NVidia C++ template library. which provides CUDA and OpenMP backends Most STL algorithms in Thrust are data-parallel – ● What is Thrust? ● ● sorting: thrust::sort and thrust::sort_by_key 4 5 6 8 7 2 1 3 : sort: 1 2 3 4 5 6 7 8 scans: thrust::inclusive_scan. thrust::exclusive_scan etc. 45678213 : sum scan: 4 9 15 22 30 32 33 36 – .

Operators in PISTON ● Isosurface. Cut-surface & Threshold .

Halos & Halo Finder ● What is a halo? ● ● Feature of interest found in Cosmology data Cluster of particles Important analysis operator Friend-Of-Friends (FOF) halo finder – ● Halo Finder ● ● linking length & halo size ● Motivation behind a data-parallel solution ● Increased amount of simulation data available & analysis needed .

Naive Approach ● Compares each & every particle pair Require O(n ) comparisons C F G B D ● 2 E A .

space partitioning data structure for organizing points in k-dimensional space ● Use k-d tree to reduce the number of comparisons Implement using only the data-parallel primitives ● ● thrust::for_each.Data-Parallel FOF Halo Finder Operator ● Balanced k-d tree from the particles ● K-d tree is a.. thrust::transform. thrust::scatter. thrust::gather & thrust::copy . thrust::sort.

2. G E A A X rank 1 Y rank 0 Z rank 0 B 0 3 1 D 6 2 2 C 2 6 3 F 5 5 5 EG 4 3 1 4 4 6 . B.4.0) (0. C. F.3. D.0) (8.1.6.0) (4.0) (5.5.0) (2. E.0) B K-d tree C F G D 0 A.4.Balanced k-d tree Creation A B C D E F G (1.0) (3.

1..3.6. E. 2. C. D.0) (3.Balanced k-d tree Creation A B C D E F G (1.4.0) (2.0) (4.2. G E A Segment in X axis A X rank 1 Y rank 0 Z rank 0 B 0 3 1 D 6 2 2 C 2 6 3 F 5 5 5 EG 4 3 1 4 4 6 .4.0) (0.5.5 in X axis C F G D K-d tree 0 A.0) B Split value. F.0) (5.0) (8.. B.

G K-d tree 0 E A Segment in X axis A X rank 1 Y rank 0 Z rank 0 B 0 3 1 CD 2 6 6 2 3 2 F 5 5 5 EG 4 3 1 4 4 6 . B.0) (5.0) (8..4..0) B Split value.0) (2.0) (0. C 2 D.0) (4.5.3.0) (3. 2. F.2. E.1.5 in X axis C F G D 1 A.4.Balanced k-d tree Creation A B C D E F G (1.6.

0) (5. G K-d tree 0 E A Segment in X axis A X rank 1 Y rank 0 Z rank 0 B 0 1 1 CD 2 3 2 1 2 0 F 2 3 2 EG 1 0 0 2 1 3 .0) (3.3.0) (0. 2.0) (4.5 in X axis C F G D 1 A.4.Balanced k-d tree Creation A B C D E F G (1.0) (8.4. E.0) (2.5. B.1. C 2 D.6.0) B Split value. F..2..

0) (4.. 2.1. G 0 E Segment in Y axis A X rank 1 Y rank 0 Z rank 0 B 0 1 1 CD 2 3 2 1 2 0 F 2 3 2 EG 1 0 0 2 1 3 .5 in Y axis D 1 A.5. B.0) (5.6..0) (8.5 in Y axis A K-d tree C F G Split value.3. C 2 D.. E.4.4.0) (2.Balanced k-d tree Creation A B C D E F G (1.0) (3. 3.0) (0. F..0) B Split value.2.

.0) (4.5 in Y axis A K-d tree C F G Split value. 3. C D.4.3.0) (8..0) B Split value.0) (0. G E Segment in Y axis A X rank 1 Y rank 0 Z rank 0 B 0 1 1 CD 2 3 2 1 2 0 E 1 0 1 FG 2 0 3 2 2 3 . E 6 F.6.5 in Y axis D 1 2 0 3 A 4 5 B.1.2. 2..0) (2.0) (5.5.Balanced k-d tree Creation A B C D E F G (1.4.0) (3..

2.6. C D..5 in Y axis A K-d tree C F G Split value. 2.3.0) B Split value..Balanced k-d tree Creation A B C D E F G (1.0) (5.4. G E Segment in Y axis A X rank 0 Y rank 0 Z rank 0 B 0 0 0 CD 1 1 1 1 1 0 E 0 0 1 FG 1 0 1 0 0 1 .1.0) (8.0) (4.5.5 in Y axis D 1 2 0 3 A 4 5 B. 3.0) (3.4..0) (2.. E 6 F.0) (0.

4. child details.0) (4.0) (8.4.0) (0.6.0) (5.0) (2.5. segment details & split value .1.0) (3.3.2.Balanced k-d tree Creation A B C D E F G (1.0) B K-d tree C F G D 3 4 5 6 1 2 0 E A A 7 8 9 10 11 12 B C D E F G A X rank 0 Y rank 0 Z rank 0 B 0 0 0 CD 0 0 0 0 0 0 E 0 0 0 FG 0 0 0 0 0 0 At each k-d tree node store parent.

Finding Halos ● ● Bottom-up approach At each level.4.0) (0.2.0) (5.6.0) (3.0) (8.0) (4. consider all nodes in the level K-d tree A B C D E F G (1.1.0) (2.0) 0 1 2 3 4 5 6 A 7 8 9 10 11 12 B C D E F G .5.3.4.

0) (4.4.0) K-d tree 0 1 2 3 4 5 6 Split value at 0 is 2.Finding Halos ● ● Bottom-up approach At each level.0) (8.0) (2.4.0) (0.0) (5.5 A 7 8 9 10 11 12 B C D E F G . consider all nodes in the level ● Look at the split value & segment particles A B C D E F G (1.2.0) (3.1.5.6.3.

3.0) 0 ● Determine the particles within the linking length in the split axis 1 2 3 4 5 6 Split value at 0 is 2. consider all nodes in the level ● Look at the split value & segment particles K-d tree A B C D E F G (1.5.0) (3.0) (5.0) (4.0) (0.6.0) (8.1.4.4.Finding Halos ● ● Bottom-up approach At each level.0) (2.2.5 Linking length 2 A 7 8 9 10 11 12 B C D E F G .

0) (0.6.0) (3.4.5.2.0) (2.3.Finding Halos ● ● Bottom-up approach At each level.1.0) 0 ● Determine the particles within the linking length in the split axis 1 2 ● Do m*n comparisons & determine halos Split value at 0 is 2.0) (8.5 Filter halos Linking length 2 3 4 5 6 ● A 7 8 9 10 11 12 B C D E F G . consider all nodes in the level ● Look at the split value & segment particles K-d tree A B C D E F G (1.4.0) (5.0) (4.

Optimization Use of Bounding Boxes ● Each node has a bounding box calculated by looking at its segment particles Use the BB to reduce the comparisons K-d tree 0 ● 1 2 3 4 5 6 A 7 8 9 10 11 12 B C D E F G .

Results 24474 particles .

10 ..Results 24474 particles Linking length 0.2 Halo size 100 Halos found.

1 Halo size 100 Halos found..Results 24474 particles Linking length 1. 5 .

00029s 0..142s 0.054s Bounding box Finding computation halos 0.256s 0.090s 23 14 Halos found Next steps.Results Some preliminary results on halo finding using OpenMP Number of particles Number of threads 1 21441 2 4 1 42882 2 4 Timings k-d tree creation 0.066s 0.044s 0..0011s 0.085s 0.026s 0.00049s 0.141s 0.052s 0. Get this running on CUDA Compare this with the VTK halo finder implementation .041s 0.092s 0.0005s 0.00021s 0.0007s 0.

Thank You. .