Textbook Evolving Openmp For Evolving Architectures 14Th International Workshop On Openmp Iwomp 2018 Barcelona Spain September 26 28 2018 Proceedings Bronis R de Supinski Ebook All Chapter PDF

Evolving OpenMP for Evolving
Architectures 14th International

Workshop on OpenMP IWOMP 2018
Barcelona Spain September 26 28 2018
Proceedings Bronis R. De Supinski
Visit to download the full and correct content document:
https://textbookfull.com/product/evolving-openmp-for-evolving-architectures-14th-inter
national-workshop-on-openmp-iwomp-2018-barcelona-spain-september-26-28-2018-
proceedings-bronis-r-de-supinski/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
Graph Drawing and Network Visualization 26th

International Symposium GD 2018 Barcelona Spain
September 26 28 2018 Proceedings Therese Biedl
https://textbookfull.com/product/graph-drawing-and-network-
visualization-26th-international-symposium-gd-2018-barcelona-
spain-september-26-28-2018-proceedings-therese-biedl/
Security and Trust Management 14th International

Workshop STM 2018 Barcelona Spain September 6 7 2018
Proceedings Sokratis K. Katsikas
https://textbookfull.com/product/security-and-trust-
management-14th-international-workshop-stm-2018-barcelona-spain-
september-6-7-2018-proceedings-sokratis-k-katsikas/
Emerging Technologies for Authorization and

Authentication First International Workshop ETAA 2018
Barcelona Spain September 7 2018 Proceedings Andrea
Saracino
https://textbookfull.com/product/emerging-technologies-for-
authorization-and-authentication-first-international-workshop-
etaa-2018-barcelona-spain-september-7-2018-proceedings-andrea-
saracino/
Privacy in Statistical Databases UNESCO Chair in Data

Privacy International Conference PSD 2018 Valencia
Spain September 26 28 2018 Proceedings Josep Domingo-
Ferrer
https://textbookfull.com/product/privacy-in-statistical-
databases-unesco-chair-in-data-privacy-international-conference-
psd-2018-valencia-spain-september-26-28-2018-proceedings-josep-
Data Privacy Management Cryptocurrencies and Blockchain
Technology ESORICS 2018 International Workshops DPM
2018 and CBT 2018 Barcelona Spain September 6 7 2018
Proceedings Joaquin Garcia-Alfaro
https://textbookfull.com/product/data-privacy-management-
cryptocurrencies-and-blockchain-technology-
esorics-2018-international-workshops-dpm-2018-and-
cbt-2018-barcelona-spain-september-6-7-2018-proceedings-joaquin-
garcia-alfaro/
Shape in Medical Imaging International Workshop ShapeMI
2018 Held in Conjunction with MICCAI 2018 Granada Spain
September 20 2018 Proceedings Martin Reuter
https://textbookfull.com/product/shape-in-medical-imaging-
international-workshop-shapemi-2018-held-in-conjunction-with-
miccai-2018-granada-spain-september-20-2018-proceedings-martin-
reuter/
Software Architecture 12th European Conference on

Software Architecture ECSA 2018 Madrid Spain September
24 28 2018 Proceedings Carlos E. Cuesta
https://textbookfull.com/product/software-architecture-12th-
european-conference-on-software-architecture-ecsa-2018-madrid-
spain-september-24-28-2018-proceedings-carlos-e-cuesta/
PRedictive Intelligence in MEdicine First International

Workshop PRIME 2018 Held in Conjunction with MICCAI
2018 Granada Spain September 16 2018 Proceedings Islem
Rekik
https://textbookfull.com/product/predictive-intelligence-in-
medicine-first-international-workshop-prime-2018-held-in-
conjunction-with-miccai-2018-granada-spain-
september-16-2018-proceedings-islem-rekik/
Swarm Intelligence 12th International Conference ANTS

2020 Barcelona Spain October 26 28 2020 Proceedings
Marco Dorigo
https://textbookfull.com/product/swarm-intelligence-12th-
international-conference-ants-2020-barcelona-spain-
october-26-28-2020-proceedings-marco-dorigo/
Bronis R. de Supinski
Pedro Valero-Lara
Xavier Martorell
Sergi Mateo Bellido
Jesus Labarta (Eds.)
LNCS 11128
Evolving OpenMP
for Evolving Architectures
14th International Workshop on OpenMP, IWOMP 2018
Barcelona, Spain, September 26–28, 2018
Proceedings
123
Lecture Notes in Computer Science 11128
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology Madras, Chennai, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7408
Bronis R. de Supinski Pedro Valero-Lara
•
Xavier Martorell Sergi Mateo Bellido

•
Jesus Labarta (Eds.)
Evolving OpenMP
for Evolving Architectures
14th International Workshop on OpenMP, IWOMP 2018
Barcelona, Spain, September 26–28, 2018
Proceedings
123
Editors
Bronis R. de Supinski Sergi Mateo Bellido
Lawrence Livermore National Laboratory Barcelona Supercomputing Center
Livermore, CA Barcelona, Barcelona
USA Spain
Pedro Valero-Lara Jesus Labarta
Barcelona Supercomputing Center Universitat Politècnica de Catalunya
Barcelona, Barcelona Barcelona, Barcelona
Spain Spain
Xavier Martorell
Universitat Politècnica de Catalunya
Barcelona
Spain
ISSN 0302-9743 ISSN 1611-3349 (electronic)

Lecture Notes in Computer Science
ISBN 978-3-319-98520-6 ISBN 978-3-319-98521-3 (eBook)
https://doi.org/10.1007/978-3-319-98521-3
Library of Congress Control Number: 2018950652
LNCS Sublibrary: SL2 – Programming and Software Engineering
© Springer Nature Switzerland AG 2018

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
OpenMP is a widely accepted, standard application programming interface (API) for

high-level shared-memory parallel programming in Fortran, C, and C++. Since its
introduction in 1997, OpenMP has gained support from most high-performance
compiler and hardware vendors. Under the direction of the OpenMP Architecture
Review Board (ARB), the OpenMP specification has evolved up to and beyond version
4.5. The 4.5 version includes several refinements to existing support for heterogeneous
hardware environments, many enhancements to its tasking model including the task-
loop construct, and support for doacross loops. As indicated in TR7, OpenMP 5.0, will
include significant new features, such as mechanisms for memory affinity and the
standardization of tool APIs, and improvements in existing ones, such as the device and
tasking constructs.
The evolution of the standard would be impossible without active research in
OpenMP compilers, runtime systems, tools, and environments. OpenMP is important
both as a standalone parallel programming model and as part of a hybrid programming
model for massively parallel, distributed memory systems built from multicore,
manycore, and heterogeneous node architectures. Overall, OpenMP offers important
features that can improve the scalability of applications on expected exascale
architectures.
The community of OpenMP researchers and developers is united under the cOM-
Punity organization. This organization has held workshops on OpenMP around the
world since 1999: the European Workshop on OpenMP (EWOMP), the North
American Workshop on OpenMP Applications and Tools (WOM-PAT), and the Asian
Workshop on OpenMP Experiences and Implementation (WOMPEI), which attracted
annual audiences from academia and industry. The International Workshop on
OpenMP (IWOMP) consolidated these three workshop series into a single annual
international event that rotates across Europe, Asia-Pacific, and the Americas. The first
IWOMP workshop was organized under the auspices of cOMPunity. Since that
workshop, the IWOMP Steering Committee has organized these events and guided the
development of the series. The first IWOMP meeting was held in 2005, in Eugene,
Oregon, USA. Since then, meetings have been held each year, in: Reims, France;
Beijing, China; West Lafayette, USA; Dresden, Germany; Tsukuba, Japan; Chicago,
USA; Rome, Italy; Canberra, Australia; Salvador, Brazil; Aachen, Germany; Nara,
Japan; and Stony Brook, USA. Each workshop has drawn participants from research
and industry throughout the world. IWOMP 2018 continued the series with technical
papers and tutorials. The IWOMP meetings have been successful in large part due to
generous support from numerous sponsors.
The IWOMP website (www.iwomp.org) provides information on the latest event, as
well as links to websites from previous years’ events. This book contains the pro-
ceedings of IWOMP 2018. The workshop program included 16 technical papers, two
keynote talks, and a tutorial on OpenMP. The paper “The Impact of Taskyield on the
VI Preface
Design of Tasks Communicating Through MPI” by Joseph Schuchart, Keisuke

Tsugane, Jose Gracia, and Mitsuhisa Sato was selected for the Best Paper Award. All
technical papers were peer reviewed by at least three different members of the Program
Committee.
September 2018 Bronis R. de Supinski

Sergi Mateo Bellido
Organization
Program Committee Co-chairs

Bronis R. de Supinski Lawrence Livermore National Laboratory, USA
Sergi Mateo Bellido Barcelona Supercomputing Center (BSC), Spain
General Chair
Jesus Labarta Universitat Politècnica de Catalunya, Spain
Publication Chair
Pedro Valero-Lara Barcelona Supercomputing Center (BSC), Spain
Publicity Chairs
Xavier Teruel Barcelona Supercomputing Center (BSC), Spain
Matthijs van Waveren OpenMP Architecture Review Board (ARB)
Local Chair
Xavier Martorell Universitat Politècnica de Catalunya, Spain
Program Committee
Martin Kong Brookhaven National Laboratory, USA
Christian Terboven RWTH Aachen University, Germany
Terry Wilmarth Intel, USA
Nasser Giacaman University of Auckland, New Zealand
Alejandro Duran Intel, Spain
Mark Bull EPCC, University of Edinburgh, UK
Chunhua Liao Lawrence Livermore National Laboratory, USA
Stephen Olivier Sandia National Laboratories, USA
James Beyer Nvidia, USA
Thomas R. W. Scogland Lawrence Livermore National Laboratory, USA
Hal Finkel Argonne National Laboratory, USA
Oliver Sinnen University of Auckland, New Zealand
Rosa M. Badia Barcelona Supercomputing Center (BSC), Spain
Mitsuhisa Sato RIKEN AICS, Japan
Eduard Ayguade Universitat Politècnica de Catalunya, Spain
Larry Meadows Intel, USA
Deepak Eachempati Cray Inc., USA
VIII Organization
Joachim Protze RWTH Aachen University, Germany

Priya Unnikrishnan IBM Toronto Laboratory, Canada
Eric Stotzer Texas Instruments, USA
IWOMP Steering Committee

Steering Committee Chair
Matthias S. Müller RWTH Aachen University, Germany
Steering Committee
Dieter an Mey RWTH Aachen University, Germany
Eduard Ayguadé BSC and Universitat Politècnica de Catalunya, Spain
Mark Bull EPCC, University of Edinburgh, UK
Barbara Chapman Stony Brook University, USA
Bronis R. de Supinski Lawrence Livermore National Laboratory, USA
Rudolf Eigenmann Purdue University, USA
William Gropp University of Illinois, USA
Michael Klemm Intel, Germany
Kalyan Kumaran Argonne National Laboratory, USA
Federico Massaioli CASPUR, Italy
Lawrence Meadows Intel, USA
Stephen L. Olivier Sandia National Laboratories, USA
Ruud van der Pas Oracle, USA
Alistair Rendell Australian National University, Australia
Mitsuhisa Sato University of Tsukuba, Japan
Sanjiv Shah Intel, USA
Josemar Rodrigues SENAI Unidade CIMATEC, Brazil
de Souza
Christian Terboven RWTH Aachen University, Germany
Matthijs van Waveren KAUST, Saudi Arabia
Contents
Best Paper
The Impact of Taskyield on the Design of Tasks Communicating

Through MPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Joseph Schuchart, Keisuke Tsugane, José Gracia, and Mitsuhisa Sato
Loops and OpenMP
OpenMP Loop Scheduling Revisited: Making a Case for More Schedules . . . 21

Florina M. Ciorba, Christian Iwainsky, and Patrick Buder
A Proposal for Loop-Transformation Pragmas. . . . . . . . . . . . . . . . . . . . . . . 37

Michael Kruse and Hal Finkel
Extending OpenMP to Facilitate Loop Optimization . . . . . . . . . . . . . . . . . . 53

Ian Bertolacci, Michelle Mills Strout, Bronis R. de Supinski,
Thomas R. W. Scogland, Eddie C. Davis, and Catherine Olschanowsky
OpenMP in Heterogeneous Systems
Manage OpenMP GPU Data Environment Under Unified Address Space . . . . 69

Lingda Li, Hal Finkel, Martin Kong, and Barbara Chapman
OpenMP 4.5 Validation and Verification Suite for Device Offload . . . . . . . . 82

Jose Monsalve Diaz, Swaroop Pophale, Oscar Hernandez,
David E. Bernholdt, and Sunita Chandrasekaran
Trade-Off of Offloading to FPGA in OpenMP Task-Based Programming. . . . 96

Yutaka Watanabe, Jinpil Lee, Taisuke Boku, and Mitsuhisa Sato
OpenMP Improvements and Innovations
Compiler Optimizations for OpenMP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Johannes Doerfert and Hal Finkel
Supporting Function Variants in OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . 128

S. John Pennycook, Jason D. Sewall, and Alejandro Duran
Towards an OpenMP Specification for Critical Real-Time Systems . . . . . . . . 143

Maria A. Serrano, Sara Royuela, and Eduardo Quiñones
X Contents
OpenMP User Experiences: Applications and Tools
Performance Tuning to Close Ninja Gap for Accelerator Physics Emulation

System (APES) on Intel® Xeon PhiTM Processors . . . . . . . . . . . . . . . . . . . . 163
Tianmu Xin, Zhengji Zhao, Yue Hao, Binping Xiao, Qiong Wu,
Alexander Zaltsman, Kevin Smith, and Xinmin Tian
Visualization of OpenMP* Task Dependencies Using Intel®

Advisor – Flow Graph Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Vishakha Agrawal, Michael J. Voss, Pablo Reble, Vasanth Tovinkere,
Jeff Hammond, and Michael Klemm
A Semantics-Driven Approach to Improving DataRaceBench’s

OpenMP Standard Coverage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Chunhua Liao, Pei-Hung Lin, Markus Schordan, and Ian Karlin
Tasking Evaluations
On the Impact of OpenMP Task Granularity. . . . . . . . . . . . . . . . . . . . . . . . 205

Thierry Gautier, Christian Perez, and Jérôme Richard
Mapping OpenMP to a Distributed Tasking Runtime . . . . . . . . . . . . . . . . . . 222

Jeremy Kemp and Barbara Chapman
Assessing Task-to-Data Affinity in the LLVM OpenMP Runtime . . . . . . . . . 236

Jannis Klinkenberg, Philipp Samfass, Christian Terboven,
Alejandro Duran, Michael Klemm, Xavier Teruel, Sergi Mateo,
Stephen L. Olivier, and Matthias S. Müller
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

Best Paper
The Impact of Taskyield on the Design
of Tasks Communicating Through MPI
Joseph Schuchart1(B) , Keisuke Tsugane2 , José Gracia1 , and Mitsuhisa Sato2

1
High-Performance Computing Center Stuttgart (HLRS),
University of Stuttgart, Stuttgart, Germany
{schuchart,gracia}@hlrs.de
2
University of Tsukuba, Tsukuba, Japan
tsugane@hpcs.cs.tsukuba.ac.jp, msato@cs.tsukuba.ac.jp
Abstract. The OpenMP tasking directives promise to help expose a

higher degree of concurrency to the runtime than traditional worksharing
constructs, which is especially useful for irregular applications. In com-
bination with process-based parallelization such as MPI, the taskyield
construct in OpenMP can become a crucial aspect as it helps to hide
communication latencies by allowing a thread to execute other tasks
while completion of the communication operation is pending. Unfor-
tunately, the OpenMP standard only provides little guarantees on the
characteristics of the taskyield operation. In this paper, we explore dif-
ferent potential implementations of taskyield and present a portable
black-box tool for detecting the actual implementation used in exist-
ing OpenMP compilers/runtimes. Furthermore, we discuss the impact
of the different taskyield implementations on the task design of the
communication-heavy Blocked Cholesky Factorization and the difference
in performance that can be observed, which we found to be over 20 %.
Keywords: OpenMP tasks · Task-yield

Blocked Cholesky Factorization · Hybrid MPI/OpenMP · OmpSs
1 Motivation
Task-based programming in OpenMP is gaining traction among developers of

parallel applications, both for pure OpenMP and hybrid MPI + OpenMP appli-
cations, as it allows the user to expose a higher degree of concurrency to the
scheduler than is possible using traditional work-sharing constructs such as par-
allel loops. By specifying data dependencies between tasks, the user can build a
task-graph that is used by the scheduler to determine partial execution ordering
of tasks. At the same time, OpenMP seems to become an interesting choice for
higher-level abstractions to provide an easy transition path [9,10].
The OpenMP standard specifies the taskyield pragma that signals to the
scheduler that the execution of the current task may be suspended and the
thread may execute another task before returning to the current task (in case of
c Springer Nature Switzerland AG 2018
B. R. de Supinski et al. (Eds.): IWOMP 2018, LNCS 11128, pp. 3–17, 2018.
https://doi.org/10.1007/978-3-319-98521-3_1
4 J. Schuchart et al.
untied tasks, the execution of the latter may be resumed by a different thread
as well) [6]. This feature has the potential to hide latencies that may occur
during the execution of a task, e.g., waiting for an external event or operation
to complete. Among the sources for such latencies are I/O operations and (more
specifically) inter-process communication such as MPI operations.
Traditionally, the combination of MPI + OpenMP using work-sharing con-
structs requires a fork-join model where thread-parallel processing interchanges
with sequential parts that perform data exchange with other MPI ranks. Using
OpenMP tasks, the user may structure the application in a way that communica-
tion can be performed concurrently with other tasks, e.g., perform computation
that is independent of the communication operations [1,4]. In that case, com-
munication may be split into multiple tasks to (i) parallelize the communication
and (ii) reduce synchronization slack by achieving a fine-grained synchronization
scheme across process boundaries, e.g., start the computation of a of subset of
boundary cells as soon as all required halo cells from a specific neighbor have been
received. The MPI standard supports thread-parallel communication for appli-
cations requesting the support for MPI THREAD MULTIPLE during initialization [5],
which is supported by all major implementations by now.
Fine-grained synchronization is especially useful for communication-heavy
hybrid applications such as the distributed Blocked Cholesky Factorization,
which decomposes a real-valued, positive-definite matrix A into its lower tri-
angular matrix L and its transpose, i.e., A = LLT . The computation of a block
can naturally be modeled as a task that may start executing as soon as the
blocks required as input are available, either coming from previous local tasks
or received from another MPI rank. Figure 1 depicts two different approaches
towards decomposing the computation and communication of blocks with a
cyclic distribution across two processes (Fig. 1a) into tasks: Fig. 1b shows the
use of coarse communication tasks, i.e., all send and receive operations are fun-
neled through a single task. In Fig. 1c the communication is further divided
into individual tasks for send and receive operations, which has the potential of
exposing a higher degree of concurrency as tasks requiring remote blocks, e.g.,
some of the tasks at the bottom of the depicted graph, may start executing as
soon as the respective receive operation has been completed. In particular, the
synchronization stemming from the communication of blocks between tasks on
different processes only involves a single producing task and a set of consuming
tasks instead of including all producing and consuming tasks. While this may
seem trivial for the depicted dual-process scenario, the difference can be expected
to be more significant with increasing numbers of processes.
With MPI two-sided communication, a receive operation can only complete
once a matching send operation has been posted on the sending side, and vice
versa. At the same time, a portable OpenMP application should neither rely on
the execution order of runnable tasks nor on the number of threads available
to execute these tasks [6, Sect. 2.9.5]. Thus, it should not rely on a sufficient
number of threads to be available to execute all available communication tasks,
e.g., to make sure all send and receive operations are posted. In order to perform
The Impact of Taskyield on the Design of Communicating Tasks 5
(a) Cyclic block distri- (b) Coarse-grain communication tasks

bution
potrf (0, 0)
send(0,0) recv(0,0)
trsm (0, 2) trsm (0, 4) trsm (0, 1) trsm (0, 3)
send (0, 2) send (0, 4) send (0, 1) send (0, 3)
syrk (2,2) gemm (2, 4) syrk (4,4)
recv (0, 3) recv (0, 1) recv (0, 2) recv (0, 4)
gemm (3, 4) syrk (3,3) gemm (1, 3) syrk (1,1) gemm (1,2) gemm (2, 3) gemm (1, 4)
potrf (1, 1)
send(1,1) recv(1,1)
(c) Fine-grain communication tasks
Fig. 1. Task graph of the first iteration of the Blocked Cholesky Factorization on a
matrix with cyclic distribution of 5 × 5 blocks across 2 processes with coarse- and fine-
grain communication tasks. Local dependencies are depicted as strong dashed lines and
communication as fine-dashed lines.
fine-grained synchronization using MPI two-sided communication, the user may

be tempted to rely on taskyield to force the task scheduler to start the execution
of all available tasks. Otherwise, the communication in Fig. 1c may deadlock as
only either send or recv may be posted concurrently on both processes.
Unfortunately, the OpenMP standard is rather vague on the expected behav-
ior of taskyield as it only specifies that the execution of the current task may be
interrupted and replaced by another task. The standard does not guarantee that
the current thread starts the execution of another runnable task (if available)
upon encountering a taskyield directive. An implementation may safely ignore
this directive and continue the execution of the currently active task. At the
same time, OpenMP does not offer a way to query the underlying implementa-
tion for the characteristics of taskyield, making it hard for users to judge the
extent to which fine-grained communication tasks may be possible.
In this paper, we discuss possible implementations of taskyield in Sect. 2. We

further present a simple and portable black-box test to query the characteristics
of taskyield and summarize the results on different OpenMP implementations
in Sect. 3. Section 4 provides a performance comparison of different implemen-
tations of the distributed Blocked Cholesky Factorization that are adapted to
the previously found characteristics. We draw our conclusions and outline future
work in Sect. 5.
2 Potential Implementations of Taskyield
As mentioned above, the OpenMP standard leaves a high degree of freedom

to implementations as to whether and how taskyield is supported. While this
provides enough flexibility for implementations, it decreases the portability of
application relying on specific properties of taskyield. In this section, we present
a classification of possible implementations and outline some of their potential
benefits and drawbacks.
No-op. Figure 2a depicts the simplest possible implementation: ignoring the

taskyield statement simplifies the scheduling of tasks as a thread is only exe-
cuting one task at a time without having to handle different task contexts (the
context of a task is implicitely provided by the thread’s private stack). For the
purpose of hiding latencies, this implementation does not provide any benefit
as tasks trying to hide latencies will simply block waiting for the operation to
complete without any further useful work being performed in the meantime.
Consequently, the <slack> of task T1 cannot be hidden and the runnable tasks
T2 and T3 are only scheduled once T1 has finished its execution ( 1 and 2 ).
This may lead to deadlocks in two-sided MPI communication if not enough tasks
are executed to post a sufficient number of communication operations.
Stack-Based. In Fig. 2b, a stack-based implementation of taskyield is depicted.

Upon encountering a taskyield directive in task T1, the current thread retrieves
the next runnable task T2 from the Queue (if any) and starts its execution
on top of T1 ( 1 ). The execution of T1 resumes only after the execution of
T2 finished. T1 may test the state of the operations it waits on and yield the
thread to another task, e.g., T3 ( 2 ). This scheme provides a simple way for
hiding latencies: tasks are not required to maintain their own execution context
as again the stack of the current thread hosts the execution context and tasks are
not enqueued into a queue upon encountering a taskyield directive. However,
thread-private stacks are typically limited in size (albeit user-configurable) and
thus implementations may have to limit the task-yield stack depth. Another
drawback is the potential serialization of tasks: if all tasks that are being stacked
upon each other happen to call taskyield, the re-entrance of the lower tasks is
delayed until all tasks on top of them have finished their execution. Deadlocks
in MPI two-sided communication may still occur if the number of operations is
greater than the accumulated task-yield stack limit of the available threads.
(a) No-op (b) Stack-based
(c) Cyclic (d) N-Cyclic (N=1)
Fig. 2. Possible implementations of task-yield (single thread, three tasks).
Cyclic. An implementation supporting cyclic taskyield has the ability to

reschedule tasks into the ready-queue for later execution by the same or another
thread (in case the task is marked as untied). As depicted in Fig. 2c, upon
encountering a taskyield ( 1 ) the thread moves the current task with its con-
text to the end of the ready-queue and starts executing the next available task
( 2 ). Since the task T1 has been inserted at the end of the queue, its execution
will only resume after the execution of T3 has completed or suspended through
a call to taskyield ( 3 and 4 ). Provided that a sufficient number of runnable
tasks is available, the latency of the operation started by T1 can be completely
hidden behind the execution of T2 and T3. However, cyclic taskyield requires
each task to maintain its own execution context as tasks are swapped in and out
and thus cannot rely on the thread’s stack to host their context. Even in the
presence of a large number of communication tasks, cyclic taskyield guarantees
the execution of all communicating tasks, thus avoiding deadlocks with MPI
two-sided communication.
N-cyclic. The N-cyclic taskyield depicted in Fig. 2d is a generalization of the

cyclic taskyield in that it inserts the yielding task at position N into the queue.
This may be desireable by the user for two reasons: (1) a large number of tasks
have been created and are runnable but the yielding task should resume its
execution only after a few other tasks have been executed and before all of the
runnable tasks executed, or (2) the available memory is scarce and may not
be sufficient for all remaining tasks to allocate contexts and yield the thread,
which would require a large number of contexts to be allocated at the same time.
However, a hard-coded (small) limit N would introduce a similar potential for
deadlocks in communication-heavy applications as the stack-based yield with
limited stack-depth.
3 Existing Taskyield Implementations

We attempted to determine the actual variant of taskyield implemented by dif-
ferent OpenMP implementations. While some implementations are open source
(GCC, Clang, OmpSs [3]), other implementations have to be treated as a black-
box (Cray, PGI, Intel). Thus, we created a black-box test that tries to determine
the characteristics of the underlying implementation.1
The test is presented in Listing 1.1. The main logic consists of a set of untied
tasks with each task containing two taskyield regions (lines 15 and 34). The test
logic relies on the constraint that only one thread is involved in its execution.
Thus, all but the master thread are trapped (lines 9 through 11) in order to
avoid restricting the enclosing parallel region to a single thread, which would
otherwise skew the result.
It is important that the tasks are marked as untied as otherwise threads
may not start the execution of sibling tasks, as mandated by the OpenMP task
scheduling constraints [6, p. 94].
Before issuing the first task-yield, each task increments a shared variable
and stores its value locally to determine the execution order (line 13). After
returning from the first yield, the first task checks whether any other task has
been executed in the meantime (line 19). If that is not the case, it is easy to
determine that the taskyield was a no-op (line 20). Otherwise, the task continues
to first check whether all or a subset of the remaining tasks have already executed
the second task-yield and thus completed their execution, in which case the
task-yield implementation is an unlimited (line 24) or depth-limited (line 26)
stack-based implementation, respectively. If this is not the case, the task checks
whether all tasks or a subset of tasks have at least reached the first task-yield,
in which case we infer either a cyclic (line 28) or N-cyclic (line 30) task-yield.
Table 1 lists the implementations we detected using the black-box test dis-
cussed above. The result for GCC matches the expectations we had after exam-
ining the available source code of libgomp. In the case of both Clang and Intel,
we found that more than one thread has to be requested to enable a stack-based
yield with a depth limit of 257 tasks. Upon further investigation, it appears that
this limit is imposed by the task creation throttling, i.e., the number of tasks
the master thread creates before it starts participating in the task processing.
We have not found a way to control the throttling behavior.
The Cray compiler behaves similarly with a stack-based yield limited by task
creation throttling, although the imposed limit appears to be 97 tasks. For the
PGI compiler, we determined a no-op task-yield.
The OmpSs compiler uses a task throttling mechanism by default, which
imposes a configurable upper limit with a default of T × 500 on the number of
tasks active, with T being the number of threads [2]. Consequently, the task-yield
should be considered N-cyclic with N = T × 500. The throttling mechanism can
be disabled, which leads to a full-cyclic task-yield.
1
The full code is available at https://github.com/devreal/omp-taskyield.
Listing 1.1. Black-box test for taskyield implementations.

1 volatile int flag_one_cntr = 0;
2 volatile int flag_two_cntr = 0;
3
4 #pragma omp parallel
5 #pragma omp master
6 for (int i = 0; i < NUM_TASKS+omp_get_num_threads()-1; ++i) {
7 #pragma omp task firstprivate(i) untied
8 {
9 if (omp_get_thread_num() > 0) {
10 // trap all but thread 0
11 while(flag_one_cntr != NUM_TASKS) { }
12 } else {
13 int task_id = ++flag_one_cntr;
14
15 #pragma omp taskyield
16
17 // when come back we only care about the first task
18 if (task_id == 1) {
19 if (flag_one_cntr == 1) {
20 printf("NOOP\n");
21 }
22 // some other tasks were running in between
23 else if (flag_two_cntr == (NUM_TASKS - 1)) {
24 printf("STACK (unlimited)\n");
25 } else if (flag_two_cntr == flag_one_cntr-1) {
26 printf("STACK (depth=%d)\n", flag_one_cntr);
27 } else if (flag_one_cntr == NUM_TASKS) {
28 printf("CYCLIC\n");
29 } else if (flag_one_cntr > 0) {
30 printf("N-CYCLIC (N=%d)\n", flag_one_cntr-1);
31 }
32 }
33
34 #pragma omp taskyield
35
36 ++flag_two_cntr;
37 } // thread-trap
38 } // pragma omp task
39 } // for()
Table 1. Detected taskyield implementations using the black-box test depicted in

Listing 1.1. T represents the number of threads.
Runtime Version tested Task-yield

GCC 7.1.0 No-op
Clang, Intel Clang 5.0.1, Intel 18.0.1 No-op
OMP NUM THREADS > 1 Stack (257)
Cray CCE 8.6.5 No-op
OMP NUM THREADS > 1 Stack (97)
PGI 17.7 No-op
OmpSs 17.06, 18.04 Cyclic (T × 500)
NX ARGS="--throttle=dummy" Cyclic
Overall, our black-box test successfully detects the yield characteristics of

existing OpenMP implementations. We hope that this test helps users experi-
menting with the OpenMP taskyield directive to choose the right task design
depending on the OpenMP implementation at hand.
4 Evaluation Using Blocked Cholesky Factorization
To evaluate the impact of the different taskyield implementations on the per-

formance of a communication-heavy hybrid application, we implemented several
variants of the Blocked Cholesky Factorization.2 The benchmark employs BLAS
level 3 routines inside OpenMP tasks with defined input and output dependen-
cies to create task graphs similar to the ones depicted in Fig. 1.
4.1 Implementations of Blocked Cholesky Factorization
Funneled Communication. We start from a version that funnels communica-

tion through a single task to exchange the blocks computed by the trsm tasks, as
depicted in Fig. 1b. This version of the benchmark is guaranteed to work with all
available OpenMP implementations and any number of threads, as no deadlock
in the MPI communication may occur.
The single-task implementation comes in two flavors: in the funneled variant
the communication task calls taskyield while waiting for the communication to
finish whereas with funneled-noyield task-yield is not used. The latter serves as
the baseline as it mimics the no-op task-yield on all OpenMP implementations.
Fine-Grained Communication Tasks. In contrast to funneled, the fine

version of the benchmark creates a task per communication operation, leading
to fine-grained task dependencies as depicted in Fig. 1c and a potentially higher
degree of concurrency exposed to the scheduler. For this version, it is essential
that a sufficient number of communication tasks can initiate block transfers to
avoid starvation, e.g., N + 1 with N being either the number of receiving or
sending tasks. Creating this version for cyclic task-yield only requires declaring
input and output dependencies between the tasks, leaving it to the scheduler to
properly execute them. As mentioned above, it is important that communication
tasks are marked as untied as otherwise threads will not switch between tasks.
Unfortunately, employing fine-grained communication tasks with implemen-
tations offering stack-based task-yield requires more effort and has been an
tedious process. Specifically, we were required to introduce dummy tasks to avoid
communication tasks from different communication phases, i.e., tasks commu-
nicating the results of potrf and trsm, to overlap. This partial serialization of
tasks is necessary as otherwise recv tasks (which naturally do not have local
input dependencies) from a later communication phase may be scheduled on top
of earlier recv tasks. The later recv tasks might then stall and never return to
2
All code is available at https://github.com/devreal/cholesky omptasks.
the earlier tasks due to implicit transitive dependencies through MPI between
them. As an example, consider the recv(1,1) task depicted on the right in
Fig. 1c being scheduled on top of recv(0,0), which has the transitive implicit
dependency recv(0,0) → trsm(0,1) → send(0,1) → recv(0,1) → syrk(1,1) →
potrf(1,1) → send(1,1) → recv(1,1). As a consequence, the user has to identify
and expose these implicit dependencies stemming from two-sided communica-
tion, which OpenMP is otherwise not aware of.
Per-Rank Communication Tasks. We also made an attempt to reduce the

number of communication tasks by combining all send and recv operations for
a specific rank into a single task. This variant is called perrank and requires
support for dependency iterators (as proposed in [7, p. 62]) to map dependencies
from the block domain to the process domain, e.g., to collect dependencies on all
blocks that have to be sent to a specific process. At the time of this writing, only
OmpSs offered support for dependency iterators (called multi-dependencies).
Even with support for dependency iterators, the perrank version is not guar-
anteed to run successfully on no-op task-yield implementations as there are no
formal guarantees on the execution order of tasks. Given a guaranteed similar
relative execution order of the communication tasks on all processes, e.g., in the
(reverse) order in which they were created, perrank will run successfully even on
a single thread. However, if the execution order is random across processes, e.g.,
with a reverse order on some processes as a worst case, a minimum number of
communication operations has to be in flight to avoid a deadlock. While we have
not established a formal definition of this lower bound, we expect it to be below
(p−1)
2 , with p being the number of processes participating in the (potentially
all-to-all) block exchange. However, the scalability in terms of processes may be
limited by the number of threads available.
4.2 Test Environment
We ran our tests on two systems: Oakforest PACS , a KNL 7250-based system
installed at the University of Tsukuba in Japan with 68-core nodes running at
1.4 GHz, and Hazel Hen, a Cray XC40 system installed at HLRS in Germany,
which is equipped with dual-socket Intel Xeon CPU E5-2680 v3 nodes running
at 2.5 GHz. On Oakforest PACS, we employed the Intel 18.0.1 compiler and Intel
MPI 2018.1.163. On Hazel Hen, we used the Intel 18.0.1, GNU 7.2.0, and Cray
CCE 8.6.5 compilers as well as Cray MPICH 7.7.0. On both systems, we used
the OmpSs compiler and runtime libraries in version 17.06 in ompss mode.
On Oakforest PACS, we relied on the runtime system implementation
to ensure proper thread pinning, i.e., using KMP AFFINITY=granularity=fine,
balanced for Intel. The OmpSs runtime performs thread pinning by default,
which we had to explicitely disable on the Cray system as it would otherwise
interfere with the thread pinning performed by aprun.
4.3 Results
Using the information from the black-box test presented in Sect. 3, we executed
the different benchmark implementations discussed above with any suitable
OpenMP compiler. We first ran the Cholesky factorization on matrices of size
64k2 double precision floating point elements with a block size of 5122 elements.
Fig. 3. Strong scaling performance and speedup relative to noyield of Blocked

Cholesky Factorization on a 64k2 matrix with block size 5122 using different OpenMP
compilers.
The measured performance of the benchmark on the Cray system is presented

in Fig. 3a. The most important observation is that the implementations using
fine-grained communication tasks (solid line) by far outperform the variant using
funneled communication (dashed lines). The speedup of fine-grained communi-
cation tasks (depicted in Fig. 3b) range up to 42 % for OmpSs and up to 24 %
for both Intel and Cray. For fine-grained dependencies, the OmpSs implementa-
tion outperforms the fine-grained implementations of the stack-based taskyield
present in the Cray and Intel implementations. It is notable that OmpSs appears
to saturate performance earlier – at 32 nodes – than the latter two implementa-
tions, which approach the saturation point at 64 nodes. It can also be observed
that the difference between the noyield and the funneled version using taskyield
is marginal, which can be attributed to the fact that yielding a single thread does
not constitute a significant increase in resource availability given that a single
thread occupies less than 5 % of the node’s overall performance.
On Oakforest PACS, except for 64 nodes the different benchmark vari-
ants perform slightly better when using the Intel OpenMP implementation as
compared to running under OmpSs, as depicted in Fig. 3c and d. This may
be attributed to the generally higher overhead of task dependency handling
observed in OmpSs [8]. However, relative to noyield, the versions using fine-
grained communication tasks exhibit up to 34 % speedup with OmpSs and 25 %
with Intel, with the main improvements seen in mid-range node numbers.
An interesting observation can be made on both systems regarding fine-
grained communication tasks on the Intel runtime: for two nodes fine-grained
communication tasks can have a negative impact on the performance, which
diminishes and turns into performance improvements with increasing node num-
bers. This effect shrinks again for the highest node numbers.
We also note that perrank communication tasks do not seem to provide sig-
nificant benefits over funneled communication.
Breakdown of CPU Time. Figure 4 presents a breakdown of the accumulated

time spent by the threads on the four BLAS operations and MPI communica-
tion as well as idle state and overhead. The latter contains task creation, task
instantiation, task switching, and idle times as we have not found a way to
further break down these numbers accurately. In all cases, the gemm kernel is
the dominating factor but it is interesting to note that the overhead/idle times
significantly increase with increasing node numbers for the funneled versions,
indicating a lack of concurrency due to the coarse-grain synchronization. With
the fine-grained synchronization, the idle times seem to be lower. However, at
least for the Intel compiler the time spent waiting on MPI communication rises
with increasing node numbers while this effect is less pronounced in OmpSs. We
attribute this to the limiting effect on the available concurrency of stack-based
task-yield: tasks below the currently executing task are blocked until the tasks
above (and potentially the communication handled by them) have finished. In
contrast to this, communication tasks in the cyclic task-yield in OmpSs fin-
ish as soon as the respective communication operations have finished and their
execution has resumed once, allowing idle threads to poll for completion.
Scaling the Problem Size. Figure 5 depicts the strong scaling results of
the Blocked Cholesky Factorization on a matrix with 128k2 elements. Due to
the computational complexity of O(n3 ), the total number of computation tasks
increases by a factor of eight compared to a matrix size of 64k2 , leading to a
total number of computation tasks of 2.8 million and putting significantly higher
pressure on the task scheduler. On the Cray XC40 (Fig. 5a), OmpSs using fine-
grained dependencies again outperforms all other implementations as it benefits
from the larger number of available computational tasks, followed by the Intel
compiler with fine-grained dependencies. All funneled runs show only limited
Fig. 4. Breakdown of CPU time of different implementations of the Blocked Cholesky

Factorization for a 64k2 matrix with block size 5122 .
Fig. 5. Strong scaling Performance of Blocked Cholesky Factorization on a 128k2

double precision floating point matrix with block size 5122 using different OpenMP
compilers.
scaling as they cannot exploit the full degree of concurrency present due to the
coarse-grained synchronization. We should note that we were unable to gather
reliable data for fine-grained communication tasks with the Cray compiler as we
saw frequent deadlocks in the MPI communication, presumably due to the lower
limit on the task-yield stack depth.
On Oakforest PACS (Fig. 5b), the scaling of OmpSs is rather limited com-
pared to the Intel OpenMP implementation. We attribute these limitations to
a relatively higher overhead involved in the OmpSs task management, which
becomes more significant with the larger number of tasks and the low serial
performance of a single KNL core. Again with both OmpSs and the Intel com-
piler, however, fine-grained communication tasks outperform the version using
funneled communication on the same compiler. The perrank version appears
perform slightly better, albeit with a far smaller benefit than the fine-grained
communication tasks.
4.4 Discussion
The results presented above demonstrate that the available implementation of
taskyield in OpenMP can have a significant impact on hybrid applications
attempting to hide communication latencies, both in terms of task design, incl.
correctness, and in terms of performance. While users can rely on a deadlock-
free execution of communication tasks with cyclic task-yield, more care has to
be given to the synchronization of tasks when using a stack-based or N-cyclic
yield with fine-grained communication tasks. With both no-op yield and – less
significant – stack-based yield the user has to ensure that a sufficient number of
communication operations can be in-flight, e.g., by ensuring a sufficient number
of OpenMP threads being available.
The variations of task-yield across OpenMP implementations make the tran-
sition from a correct, i.e., deadlock-free, sequential MPI application to a cor-
rect task-parallel MPI program tedious. In many cases, it might not be suf-
ficient to simply encapsulate data exchange between individual computation
tasks into communication tasks to achieve fine-grained synchronization and rely
on OpenMP taskyield to ensure scheduling of all necessary communication
tasks. Instead, users will have to work around the peculiarities of the different
implementations and will thus likely fall back to funneling MPI communication
through a single task to guarantee correct execution on all OpenMP implemen-
tations, potentially losing performance due to higher thread idle times.
However, our results strongly indicate that fine-grained communication tasks
outperform more restricted synchronization schemes such as funneled and
perrank as the former has the potential to significantly increase the concurrency
exposed to the runtime. Unfortunately, an application has no way to query the
current OpenMP implementation for information on the properties of taskyield
to adapt the communication task pattern dynamically. Introducing a way to
query these properties in OpenMP would allow users to adapt the behavior of
their application to best exploit the hardware potential under any given OpenMP
implementation. Similarly, providing a way to control the limits of task-yield
in all implementations would help (i) raise awareness of the potential pitfalls,
and (ii) provide the user with a way to adapt the runtime to the application’s
needs. Such configuration options could include the task creation throttling limit
(as already offered by OmpSs) and any further limitation affecting the effective-
ness of task-yield in the context of distributed-memory applications relying on
two-sided communication.
5 Conclusion and Future Work

In this paper, we have presented a classification and evaluation of poten-
tial implementations of the taskyield construct in OpenMP. We discussed
advantages and disadvantages of the different implementations and how a
communication-heavy application may have to be adapted to successfully employ
task-yield for communication latency hiding. Using a black-box test we were
able to determine the characteristics of task-yield in different OpenMP imple-
mentations. We have shown that fine-grained communication tasks may outper-
form more coarse-grained approaches while requiring additional reasoning about
implicit transitive dependencies from two-sided communication in stack-based
task-yield implementations to avoid deadlocks.
Looking ahead, we plan to investigate other types of applications beyond
Blocked Cholesky Factorization. While we do not expect the performance impact
to be that pronounced on traditional stencil applications, applications from areas
such as graph processing may benefit from taskified communication while poten-
tially suffering from similar correctness problems with non-cyclic task-yield as
presented in this paper. To help users tackle the issue of implicit transitive
dependencies, it might be worth investigating ways to signal their existence to
the OpenMP scheduler or provide users means to query or even control some
scheduler characteristics, e.g., the relative task execution order and the prop-
erties of task-yield. From a user’s (idealistic) perspective, the standard would
eventually mandate a cyclic task-yield to avoid the issues described in this
paper.
Acknowledgements. We gratefully acknowledge funding by the German Research

Foundation (DFG) through the German Priority Programme 1648 Software for Exas-
cale Computing (SPPEXA) in the SmartDASH project. Part of this work has been
supported by the European Community through the project Mont Blanc 3 (H2020
program, grant agreement No 671697). This work was supported by a fellowship within
the FITweltweit program of the German Academic Exchange Service (DAAD).
References
1. Akhmetova, D., Iakymchuk, R., Ekeberg, O., Laure, E.: Performance study of mul-
tithreaded MPI and OpenMP tasking in a large scientific code. In: 2017 IEEE Inter-
national Parallel and Distributed Processing Symposium Workshops (IPDPSW),
May 2017. https://doi.org/10.1109/IPDPSW.2017.128
2. BSC Programming Models: OmpSs User Guide, March 2018. https://pm.bsc.es/
ompss-docs/user-guide/OmpSsUserGuide.pdf
3. Duran, A., et al.: OmpSs: a proposal for programming heterogeneous multi-
core architectures. Parallel Process. Lett. (2011). https://doi.org/10.1142/
S0129626411000151
4. Meadows, L., Ishikawa, K.: OpenMP tasking and MPI in a lattice QCD benchmark.
In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S.
(eds.) IWOMP 2017. LNCS, vol. 10468, pp. 77–91. Springer, Cham (2017). https://
doi.org/10.1007/978-3-319-65578-9 6
5. Message Passing Interface Forum: MPI: A Message-Passing Interface Standard
(Version 3.1) (2015). http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
6. OpenMP Architecture Review Board: OpenMP Application Programming Inter-
face, Version 4.5 (2015). http://www.openmp.org/mp-documents/openmp-4.5.pdf
7. OpenMP Architecture Review Board: OpenMP Technical report 6: Version 5.0 Pre-
view 2 (2017). http://www.openmp.org/wp-content/uploads/openmp-TR6.pdf
8. Schuchart, J., Nachtmann, M., Gracia, J.: Patterns for OpenMP task data depen-
dency overhead measurements. In: de Supinski, B.R., Olivier, S.L., Terboven, C.,
Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 156–168.
Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9 11
9. Tsugane, K., Lee, J., Murai, H., Sato, M.: Multi-tasking execution in PGAS lan-
guage XcalableMP and communication optimization on many-core clusters. In:
Proceedings of the International Conference on High Performance Computing
in Asia-Pacific Region. HPC Asia 2018. ACM (2018). https://doi.org/10.1145/
3149457.3154482
10. YarKhan, A., Kurzak, J., Luszczek, P., Dongarra, J.: Porting the PLASMA numer-
ical library to the OpenMP standard. Int. J. Parallel Program. (2017). https://doi.
org/10.1007/s10766-016-0441-6
Loops and OpenMP
OpenMP Loop Scheduling Revisited:
Making a Case for More Schedules
Florina M. Ciorba1(B) , Christian Iwainsky2 , and Patrick Buder1 †

1
University of Basel, Basel, Switzerland
{florina.ciorba,p.buder}@unibas.ch
2
Technische Universität Darmstadt, Darmstadt, Germany
christian.iwainsky@sc.tu-darmstadt.de
http://hpc.dmi.unibas.ch, http://www.sc.tu-darmstadt.de,
http://www.hkhlr.de
Abstract. In light of continued advances in loop scheduling, this work

revisits the OpenMP loop scheduling by outlining the current state of the
art in loop scheduling and presenting evidence that the existing OpenMP
schedules are insufficient for all combinations of applications, systems,
and their characteristics. A review of the state of the art shows that
due to the specifics of the parallel applications, the variety of computing
platforms, and the numerous performance degradation factors, no single
loop scheduling technique can be a ‘one-fits-all’ solution to effectively
optimize the performance of all parallel applications in all situations.
The impact of irregularity in computational workloads and hardware sys-
tems, including operating system noise, on the performance of parallel
applications results in performance loss and has often been neglected in
loop scheduling research, in particular the context of OpenMP schedules.
Existing dynamic loop self-scheduling techniques, such as trapezoid self-
scheduling, factoring and weighted factoring, offer an unexplored poten-
tial to alleviate this degradation in OpenMP due to the fact that they
explicitly target the minimization of load imbalance and scheduling over-
head. Through theoretical and experimental evaluation, this work shows
that these loop self-scheduling methods provide a benefit in the context of
OpenMP. In conclusion, OpenMP must include more schedules to offer a
broader performance coverage of applications executing on an increasing
variety of heterogeneous shared memory computing platforms.
Keywords: Dynamic loop self-scheduling · Shared memory · OpenMP
1 Introduction
Loop-level parallelism is a very important part of many OpenMP programs.

OpenMP [3] is the decades-old industry standard for parallel programming on
shared memory platforms. Due to wide support from industry and academia, a
†
Authors in order of contribution.
c Springer Nature Switzerland AG 2018
B. R. de Supinski et al. (Eds.): IWOMP 2018, LNCS 11128, pp. 21–36, 2018.
https://doi.org/10.1007/978-3-319-98521-3_2
Another random document with
no related content on Scribd:
selves, and that the Absolute is therefore a society of selves? Our
answer to this question must depend, I think, upon two
considerations,—(a) the amount of continuity we regard as essential
to a self, and (b) the kind of unity we attribute to a society.
(a) If we regard any and every degree of felt teleological continuity
as sufficient to constitute a self, it is clear that we shall be compelled
to say that selves, and selves only, are the material of which reality is
composed. For we have already agreed that Reality is exclusively
composed of psychical fact, and that all psychical facts are
satisfactions of some form of subjective interest or craving, and
consequently that every psychical fact comprised in the whole
system of existence must form part of the experience of a finite
individual subject. Hence, if every such subject, whatever its degree
of individuality, is to be called a self, there will be no facts which are
not included somewhere in the life of one or more selves. On the
other hand, if we prefer, as I have done myself, to regard some
degree of intellectual development, sufficient for the recognition of
certain permanent interests as those of the self, as essential to
selfhood, we shall probably conclude that the self is an individual of
a relatively high type, and that there are consequently experiences of
so imperfect a degree of teleological continuity as not to merit the
title of selves.
And this conclusion seems borne out by all the empirically
ascertained facts of, e.g., the life of the lower animals, of human
infants, and again of adults of abnormally defective intellectual and
moral development. Few persons, unless committed to the defence
of a theory through thick and thin, would be prepared to call a worm
a self, and most of us would probably feel some hesitation about a
new-born baby or a congenital idiot. Again, finite societies are clearly
components of Reality, yet, as we have seen, it is probably an error
to speak of a society as a self, though every true society is clearly an
individual with a community and continuity of purpose which enable
us rightly to regard it as a unity capable of development, and to
appreciate its ethical worth. Hence it is, perhaps, less likely to lead to
misunderstandings if we say simply that the constituents of reality
are finite individual experiences, than if we say that they are selves.
The self, as we have seen, is a psychological category which only
imperfectly represents the facts of experience it is employed to
correlate.
(b) Again, if we speak of the Absolute as a society of finite
individuals, we ought at least to be careful in guarding ourselves
against misunderstanding. Such an expression has certainly some
manifest advantages. It brings out both the spiritual character of the
system of existence and the fact that, though it contains a plurality of
finite selves and contains them without discord, it is not properly
thought of as a self, but as a community of many selves.
At the same time, such language is open to misconstructions,
some of which it may be well to enumerate. We must not, for
instance, assume that all the individuals in the Absolute are
necessarily in direct social interrelation. For social relation, properly
speaking, is only possible between beings who are ἴσοι καὶ ὅμοιοι at
least in the sense of having interests of a sufficiently identical kind to
permit of intercommunication and concerted cooperation for the
realisation of a common interest. And our own experience teaches
us that the range of existence with which we ourselves stand in this
kind of relation is limited. Even within the bounds of the human race
the social relations of each of us with the majority of our fellows are
of an indirect kind, and though with the advance of civilisation the
range of those relations is constantly being enlarged, it still remains
to be seen whether a “cosmopolitan” society is a realisable ideal or
not. With the non-human animal world our social relations, in
consequence of the greater divergence of subjective interest, are
only of a rudimentary kind, and with what appears to us as inanimate
nature, as we have already seen, direct social relation seems to be
all but absolutely precluded.
Among the non-human animals, again, we certainly find traces of
relations of a rudimentarily social kind, but once more only within
relatively narrow limits; the different species and groups seem in the
main to be indifferent to one another. And we have no means of
disproving the possibility that there may be in the universe an
indefinite plurality of social groups, of an organisation equal or
superior to that of our human communities, but of a type so alien to
our own that no direct communication, not even of the elementary
kind which would suffice to establish their existence, is possible. We
must be prepared to entertain the possibility, then, that the
individuals composing the Absolute fall into a number of groups,
each consisting of members which have direct social relations of
some kind with each other, but not with the members of other
groups.
And also, of course, we must remember that there may very well
be varieties of degree of structural complexity in the social groups
themselves. In some the amount of intelligent recognition on the part
of the individuals of their own and their fellows’ common scheme of
interests and purposes is probably less articulate, in others, again, it
may be more articulate than is the case in those groups of co-
operating human beings which form the only societies of which we
know anything by direct experience.
On the other hand, we must, if we speak of the Absolute as a
society, be careful to avoid the implication, which may readily arise
from a false conception of human societies, that the unity of the
Absolute is a mere conceptual fiction or “point of view” of our own,
from which to regard what is really a mere plurality of separate units.
In spite of the now fairly complete abandonment in words of the old
atomistic theories, which treated society as if it were a mere
collective name for a multitude of really independent “individuals,” it
may be doubted whether we always realise what the rejection of this
view implies. We still tend too much to treat the selves which
compose a society, at least in our Metaphysics, as if they were given
to us in direct experience as merely exclusive of one another, rather
than as complementary to one another. In other words, of the two
typical forms of experience from which the concept of self appears to
be derived, the experience of conflict between our subjective
interests and our environment, and that of the removal of the
discord, we too often pay attention in our Metaphysics to the former
to the neglect of the latter. But in actual life it is oftener the latter that
is prominent in our relations with our fellow-men. We—the category
of co-operation—is at least as fundamental in all human thought and
language as I and thou, the categories of mutual exclusion. That you
and I are mutually complementary factors in a wider whole of
common interests, is at least as early a discovery of mankind as that
our private interests and standpoints collide.
If we speak of existence as a society, then we must be careful to
remember that the individual unity of a society is just as real a fact of
experience as the individual unity of the members which compose it,
and that, when we call the Absolute a society rather than a self, we
do not do so with any intention of casting doubt upon its complete
spiritual unity as an individual experience. With these restrictions, it
would, I think, be fair to say that if the Absolute cannot be called a
society without qualification, at any rate human society affords the
best analogy by which we can attempt to represent its systematic
unity in a concrete conceptual form. To put it otherwise, a genuine
human society is an individual of a higher type of structure than any
one of the selves which compose it, and therefore more adequately
represents the structure of the one ultimately complete system of the
Absolute.
We see this more particularly in the superior independence of
Society as compared with one of its own members. It is true, of
course, that no human society could exist apart from an external
environment, but it does not appear to be as necessary to the
existence of society as to that of a single self, that it should be
sensible of the contrast between itself and its rivals. As we have
already sufficiently seen, it is in the main from the experience of
contrast with other human selves that I come by the sense of my
own selfhood. Though the contents of my concept of self are not
purely social, it does at least seem clear that I could neither acquire
it, nor retain it long, except for the presence of other like selves
which form the complement to it. But though history teaches how
closely similar is the part played by war and other relations between
different societies in developing the sense of a common national
heritage and purpose, yet a society, once started on its course of
development, does appear to be able to a large extent to flourish
without the constant stimulus afforded by rivalry or co-operation with
other societies. One man on a desert land, if left long enough to
himself, would probably become insane or brutish; there seems no
sufficient reason to hold that a single civilised community, devoid of
relations with others, could not, if its internal organisation were
sufficiently rich, flourish in a purely “natural” environment. On the
strength of this higher self-sufficiency, itself a consequence of
superior internal wealth and harmony, a true society may reasonably
be held to be a finite individual of a higher type than a single human
self.
The general result of this discussion, then, seems to be, that
neither in the self nor in society—at any rate in the only forms of it
we know to exist—do we find the complete harmony of structure and
independence of external conditions which are characteristic of
ultimate reality. Both the self and society must therefore be
pronounced to be finite appearance, but of the two, society exhibits
the fuller and higher individuality, and is therefore the more truly real.
We found it quite impossible to regard the universe as a single self;
but, with certain important qualifications, we said that it might be
thought of as a society without very serious error.[194] It will, of
course, follow from what has been said, that we cannot frame any
finally adequate conception of the way in which all the finite
individual experiences form the unity of the infinite experiences. That
they must form such a perfect unity we have seen in our Second
Book; that the unity of a society is, perhaps, the nearest analogy by
which we can represent it, has been shown in the present
paragraph. That we have no higher categories which can adequately
indicate the precise way in which all existence ultimately forms an
even more perfect unity, is an inevitable consequence of the fact of
our own finitude. We cannot frame the categories, because we, as
finite beings, have not the corresponding experience. To this extent,
at least, it seems to me that any sound philosophy must end with a
modest confession of ignorance.
“There is in God, men say,
A deep but dazzling darkness,”
is a truth which the metaphysician’s natural desire to know as much

as possible of the final truth, should not lead him to forget.
§ 5. This is probably the place to make some reference to the
question whether the self is a permanent or only a temporary form in
which Reality appears. In popular thought this question commonly
appears as that of the immortality (sometimes, too, of the pre-
existence) of the soul. The real issue is, however, a wider one, and
the problem of immortality only one of its subsidiary aspects. I
propose to say something briefly on the general question, and also
on the special one, though in this latter case rather with a view to
indicating the line along which discussion ought to proceed, than
with the aim of suggesting a result.
It would not, I think, be possible to deny the temporary character of
the self after the investigations of the earlier part of this chapter. A
self, we said, is one and the same only in virtue of teleological
continuity of interest and purpose. But exactly how much variation is
enough to destroy this continuity, and how much again may exist
without abolishing it, we found it impossible to determine by any
general principle. Yet the facts of individual development seemed to
make it clear that new selves—i.e. new unique forms of interest in
the world—come into being in the time-process, and that old ones
disappear.
And again, both from mental Pathology and from normal
Psychology, we found it easy to cite examples of the formation and
disappearance, within the life-history of a single man, of selves
which it seemed impossible to regard as connected by any felt
continuity of interest with the rest of life. In the case of multiple
personality, and alternating personality, we seemed to find evidence
that a plurality of such selves might alternate regularly, or even co-
exist in connection with the same body. The less striking, but more
familiar, cases of the passing selves of our dreams, and of temporary
periods in waking life where our interest and characters are modified,
but not in a permanent way by exceptional excitements, belong in
principle to the same category. In short, unless you are to be content
with a beggarly modicum of continuity of purpose too meagre to be
more than an empty name, you seem forced to conclude that the
origination and again the disappearance of selves in the course of
psychical events is a fact of constant occurrence. No doubt, the
higher the internal organisation of our interests and purposes, the
more fixed and the less liable to serious modification in the flux of
circumstance our self becomes; but a self absolutely fixed and
unalterable was, as we saw, an unrealised and, on the strength of
our metaphysical certainty that only the absolute whole is entirely
self-determined, we may add, an unrealisable ideal. We seem
driven, then, to conclude that the permanent identity of the self is a
matter of degree, and that we are not entitled to assert that the self
corresponding to a single organism need be either single or
persistent. It is possible for me, even in the period between birth and
death, to lose my old self and acquire a new one, and even to have
more selves than one, and those of different degrees of individual
structure, at the same time. Nor can we assign any certain criterion
by which to decide in all cases whether the self has been one and
identical through a series of psychical events. Beyond the general
assertion that the more completely occupied our various interests
and purposes are, the more permanent is our selfhood, we are
unable to go.[195]
These considerations have an important bearing on the vexed
question of a future life. If they are justified, we clearly cannot have
any positive demonstration from the nature of the self of its
indestructibility, and it would therefore be in vain to demand that
philosophy shall prove the permanence of all selves. On the other
hand, if the permanence of a self is ultimately a function of its inner
unity of aim and purpose, there is no a priori ground for holding that
the physical event of death must necessarily destroy this unity, and
so that the self must be perishable at death. For Metaphysics, the
problem thus seems to resolve itself into a balancing of probabilities,
and, as an illustration of the kind of consideration which has to be
taken into account, it may be worth while to inquire what probable
arguments may fairly be allowed to count on either side.
On the negative side, if we dismiss, as we fairly may, the unproved
assertions of dogmatic Materialism, we have to take account of the
possibility that a body may, for all we know, be a necessary condition
for the existence of an individual experience continuous in interest
and purpose with that of our present life, and also of the alleged
absence of any positive empirical evidence for existence after death.
These considerations, however, scarcely seem decisive. As to the
first, I do not see how it can be shown that a body is indispensable,
at least in the sense of the term “body” required by the argument. It
is no doubt true that in the experience of any individual there must
be the two aspects of fresh teleological initiative and of already
systematised habitual and quasi-mechanical repetition of useful
reactions already established, and further, that intercourse between
different individuals is only possible through the medium of such a
system of established habits. As we have already seen, what we call
our body is simply a name for such a set of habitual reactions
through which intercommunication between members of human
societies is rendered possible. Hence, if we generalise the term
“body” to stand for any system of habitual reactions discharging this
function of serving as a medium of communication between
individuals forming a society, we may fairly say that a body is
indispensable to the existence of a self. But it seems impossible to
show that the possibility of such a medium of communication is
removed by the dissolution of the particular system of reactions
which constitutes our present medium of intercourse. The dissolution
of the present body might mean no more than the individual
acquisition of changed types of habitual reaction, types which no
longer serve the purpose of communication with the members of our
society, but yet may be an initial condition of communication with
other groups of intelligent beings.
As to the absence of empirical evidence, it is, of course, notorious
that some persons at least claim to possess such evidence of the
continued existence of the departed. Until the alleged facts have
been made the subject of serious and unbiassed collection and
examination, it is, I think, premature to pronounce an opinion as to
their evidential value. I will therefore make only one observation with
respect to some of the alleged evidence from “necromancy.” It is
manifest that the only kind of continuance which could fairly be
called a survival of the self, and certainly the only kind in which we
need feel any interest, would be the persistence after death of our
characteristic interests and purposes. Unless the “soul” continued to
live for aims and interests teleologically continuous with those of its
earthly life, there would be no genuine extension of our selfhood
beyond the grave. Hence any kind of evidence for continued
existence which is not at the same time evidence for continuity of
interests and purposes, is really worthless when offered as testimony
to “immortality.” The reader will be able to apply this reflection for
himself if he knows anything of the “phenomena” of the vulgar
Spiritualism.[196]
When we turn to the positive side of the question, it seems
necessary to remark that though the negative considerations we
have just referred to are not of themselves enough to disprove
“immortality,” provided there is any strong ground for taking it as a
fact, they would be quite sufficient to decide against it, unless there
is positive reason for accepting it. That we have no direct evidence
of such a state of things, and cannot see precisely how in detail it
could come about, would not be good logical ground for denying its
existence if it were demanded by sound philosophical principles. On
the other hand, if there were no reasons for believing in it, and good,
though not conclusive, probable reasons against it, we should be
bound to come provisionally to a negative conclusion.
Have we then any positive grounds at all to set against the
negative considerations just discussed? Pending the result of
inquiries which have recently been set on foot, it is hard to speak
with absolute confidence; still, the study of literature does, I think,
warrant us in provisionally saying that there seems to be a strong
and widely diffused feeling, at least in the Western world, that life
without any hope of continuance after death would be an
unsatisfactory thing. This feeling expresses itself in many forms, but I
think they can all be traced to one root. Normally, as we know, the
extinction of a particular teleological interest is effected by its
realisation; our purposes die out, and our self so far suffers change,
when their result has been achieved. (And incidentally this may help
us to see once more that dissatisfaction and imperfection are of the
essence of the finite self. The finite self lives on the division of idea
from reality, of intent from execution. If the two could become
identical, the self would have lost the atmosphere from which it
draws its life-breath.) Hence, if death, in our experience, always took
the form of the dissolution of a self which had already seen its
purposes fulfilled and its aims achieved, there would probably be no
incentive to desire or believe in future continuance. But it is a familiar
fact that death is constantly coming as a violent and irrational
interrupter of unrealised plans and inchoate work. The self seems to
disappear not because it has played its part and finished its work,
but as the victim of external accident. I think that analysis would
show, under the various special forms which the desire for
immortality takes, such as the yearning to renew interrupted
friendships or the longing to continue unfinished work, as their
common principle, the feeling of resentment against this apparent
defeat of intelligent purpose by brute external accident.[197]
Now, what is the logical value of this feeling as a basis for
argument? We may fairly say, on the one hand, that it rests on a
sound principle. For it embodies the conviction, of which all
Philosophy is the elaboration, that the real world is a harmonious
system in which irrational accident plays no part, and that, if we
could only see the whole truth, we should realise that there is no final
and irremediable defeat for any of our aspirations, but all are
somehow made good. On the other side, we must remember that the
argument from the desire for continuance to its reality also goes on
to assert not only that our aspirations are somehow fulfilled and our
unfinished work somehow perfected, but that this fulfilment takes
place in the particular way which we, with our present lights, would
wish. And in maintaining this, the argument goes beyond the
conclusion which philosophical first principles warrant.
For it might be that, if our insight into the scheme of the world were
less defective, we should cease to desire this special form of
fulfilment, just as in growing into manhood we cease to desire the
kind of life which appeared to us as children the ideal of happiness.
The man’s life-work may be the realisation of the child’s dreams, but
it does not realise them in the form imagined by childhood. And
conceivably it might be so with our desire for a future life. Further, of
course, the logical value of the argument from feeling must to some
extent depend upon the universality and persistence of the feeling
itself. We must not mistake for a fundamental aspiration of humanity
what may be largely the effect of special traditions and training.
Hence we cannot truly estimate the worth of the inference from
feeling until we know both how far the feeling itself is really
permanent in our own society, and how far, again, it exists in
societies with different beliefs and traditions. In itself the sentiment,
e.g., of Christian civilisation, cannot be taken as evidence of the
universal feeling of mankind, in the face of the apparently opposite
feelings, e.g., of Brahmins and Buddhists.
I should conclude, then, that the question of a future life must
remain an open one for Metaphysics. We seem unable to give any
valid metaphysical arguments for a future life, but then, on the other
hand, the negative presumptions seem to be equally devoid of
cogency. Philosophy, in this matter, to use the fine phrase of Dr.
McTaggart, “gives us hope,”[198] and I cannot, for my own part, see
that it can do more. Possibly, as Browning suggests in La Saisiaz, it
is not desirable, in the interests of practical life, that it should do
more. And here I must leave the question with the reader, only
throwing out one tentative suggestion for his approval or rejection as
he pleases. Since we have seen that the permanence of the self
depends upon its degree of internal harmony of structure, it is at
least conceivable that its continuance as a self, beyond the limits of
earthly life, may depend on the same condition. Conceivably the self
may survive death, as it survives lesser changes in the course of
physical events, if its unity and harmony of purpose are strong
enough, and not otherwise. If so, a future existence would not be a
heritage into which we are safe to step when the time comes, but a
conquest to be won by the strenuous devotion of life to the
acquisition of a rich, and at the same time orderly and harmonious,
moral selfhood. And thus the belief in a future life, in so far as it acts
in any given case as a spur to such strenuous living, might be itself a
factor in bringing about its own fulfilment. It is impossible to affirm
with certainty that this is so, but, again, we cannot deny that it may
be the case. And here, as I say, I must be content to leave the
problem.[199]
Consult further:—B. Bosanquet, Psychology of the Moral Self, lect.
5; F. H. Bradley, Appearance and Reality, chaps. 9 (The Meanings of
Self), 10 (The Reality of Self), 26 (The Absolute and its
Appearances,—especially the end of the chapter, pp. 499-511 of 1st
ed.), 27 (Ultimate Doubts); L. T. Hobhouse, Theory of Knowledge,
part 3, chap. 5; S. Hodgson, Metaphysic of Experience, bk. iv. chap.
4; Hume, Treatise of Human Nature, bk. i part 4, §§ 5, 6; W. James,
Principles of Psychology, vol. i. chap. 10; H. Lotze, Metaphysic, bk.
iii chaps. 1 (especially § 245), 5; Microcosmus, bk. iii. c. 5; J. M. E.
McTaggart, Studies in Hegelian Cosmology, chap. 2 (for a detailed
hostile examination of Dr. McTaggart’s argument, which I would not
be understood to endorse except on special points, see G. E. Moore
in Proceedings of Aristotelian Society, N.S. vol. ii. pp. 188-211); J.
Royce, The World and the Individual, Second Series, lects. 6, 7.
184. “Bodily identity” itself, of course, might give rise to difficult

problems if we had space to go into them. Here I can merely suggest
certain points for the reader’s reflection. (1) All identity appears in the
end to be teleological and therefore psychical. I believe this to be the
same human body which I have seen before, because I believe that
the interests expressed in its actions will be continuous, experience
having taught me that a certain amount of physical resemblance is a
rough-and-ready criterion of psychical continuity. (2) As to the ethical
problem of responsibility referred to in the text, it is obviously entirely
one of less and more. Our moral verdicts upon our own acts and
those of others are in practice habitually influenced by the conviction
that there are degrees of moral responsibility within what the
immediate necessities of administration compel us to treat as
absolute. We do not, e.g., think a man free from all moral blame for
what he does when drunk, or undeserving of all credit for what he
performs when “taken out of himself,” i.e. out of the rut of his habitual
interests by excitement, but we certainly do, when not under the
influence of a theory, regard him as deserving of less blame or
credit, as the case may be, for his behaviour than if he had
performed the acts when he was “more himself.” On all these topics
see Mr. Bradley’s article in Mind for July 1902.
185. So “self-consciousness,” in the bad sense, always arises
from a sense of an incongruity between the self and some
contrasted object or environment.
186. Though, of course, it does appear in the process of framing
and initiating the scheme of concerted action; the other self is here
contrasted with my own, precisely because the removal of the
collision between my purpose and my environment is felt as coming
from without.
187. It might be said that it is not these features of the
environment themselves, but my “ideas” of them, which thus belong
to the self. This sounds plausible at first, but only because we are
habitually accustomed to the “introjectionist” substitution of
psychological symbols for the actualities of life. On the question of
fact, see Bradley, Appearance and Reality, chap. 8, p. 88 ff. (1st
ed.).
188. A colleague of my own tells me that in his case movements of
the eyes appear to be inseparable from the consciousness of self,
and are incapable of being extruded into the not-self in the sense
above described. I do not doubt that there are, in each of us, bodily
feelings of this kind which refuse to be relegated to the not-self and
that it would be well worth while to institute systematic inquiries over
as wide an area as possible about their precise character in
individual cases. It appears to me, however, as I have stated above,
that in ordinary perception these bodily feelings often are
apprehended simply as qualifying the perceived content without any
opposition of self and not-self. At any rate, the problem is one of
those fundamental questions in the theory of cognition which are too
readily passed over in current Psychology.
189. Of course, you can frame the concept of a “self” from which
even these bodily feelings have been extruded, and which is thus a
mere “cognitive subject” without concrete psychical quality. But as
such a mere logical subject is certainly not the self of which we are
aware in any concrete experience, and still more emphatically not
the self in which the historical and ethical sciences are interested, I
have not thought it necessary to deal with it in the text.
190. That we cannot imagine it does not appear to be any ground
for denying its actuality. It is never a valid argument against a
conclusion required to bring our knowledge into harmony with itself,
that we do not happen to possess the means of envisaging it in
sensuous imagery.
191. I venture to think that some of the rather gratuitous
hypotheses as to the rational selfhood of animal species quà species
put forward by Professor Royce in the second volume of The World
and the Individual, are illustrations of this tendency to unnecessary
over-interpretation.
192. Is it necessary to refer in particular to the suggestion that for
the Absolute the contrast-effect in question may be between itself
and its component manifestations or appearances? This would only
be possible if the finite appearances were contained in the whole in
some way which allowed them to remain at discord with one another,
i.e. in some way incompatible with the systematic character which is
the fundamental quality of the Absolute. I am glad to find myself in
accord, on the general principle at least, with Dr. McTaggart. See the
Third Essay in his recent Studies in Hegelian Cosmology.
193. It would be fruitless to object that “societies” can, in fact, have
a legal corporate personality, and so can—to revert to the illustration
used above—be sued and taxed. What can be thus dealt with is
always a mere association of definite individual human beings, who
may or may not form a genuine spiritual unity. E.g., you might
proceed against the Commissioners of Income Tax, but this does not
prove that the Commissioners of Income Tax are a genuine society.
On the other hand, the Liberal-Unionist Party probably possesses
enough community of purpose to enable it to be regarded as a true
society, but has no legal personality, and consequently no legal
rights or obligations, as a party. Similarly, the corporation known as
the Simeon Trustees has a legal personality with corresponding
rights and duties, and it also stands in close relation with the
evangelical party in the Established Church. And this party is no
doubt a true ethical society. But the corporation is not the evangelical
party, and the latter, in the sense in which it is a true society, is not a
legal person.
I may just observe that the question whether the Absolute is a self
or a person must not be confounded with the question of the
“personality of God.” We must not assume off-hand that “God” and
the Absolute are identical. Only special examination of the
phenomena of the religious life can decide for us whether “God” is
necessarily the whole of Reality. If He is not, it would clearly be
possible to unite a belief in “God’s” personality with a denial of the
personality of the Absolute, as is done, e.g., by Mr. Rashdall in his
essay in Personal Idealism. For some further remarks on the
problem, see below, Chapter V.
194. I suppose that any doctrine which denies the ultimate reality
of the finite self must expect to be confronted by the appeal to the
alleged revelation of immediate experience. Cogito, ergo sum, is
often taken as an immediately certain truth in the sense that the
existence of myself is something of which I am directly aware in
every moment of consciousness. This is, however, an entire
perversion of the facts. Undoubtedly the fact of there being
experience is one which can be verified by the very experiment of
trying to deny it. Denial itself is a felt experience. But it is (a)
probably not true that we cannot have experience at all without an
accompanying perception of self, and (b) certainly not true that the
mere feeling of self as in contrast with a not-self, when we do get it,
is what is meant by the self of Ethics and History. The self of these
sciences always embraces more than can be given in any single
moment of experience, it is an ideal construction by which we
connect moments of experience according to a general scheme. The
value of that scheme for any science can only be tested by the
success with which it does its work, and its truth is certainly not
established by the mere consideration that the facts it aims at
connecting are actual. Metaphysics would be the easiest of sciences
if you could thus take it for granted that any construction which is
based upon some aspect of experienced fact must be valid.
195. This is why Plato seems justified in laying stress upon the
dreams of the wise man as evidence of his superiority (Republic, bk.
ix. p. 571). His ideal wise man is one whose inner life is so
completely unified that there is genuine continuity of purpose
between his waking and sleeping state. Plato might perhaps have
replied to Locke’s query, that Socrates waking and Socrates asleep
are the same person, and their identity is testimony to the
exceptional wisdom and virtue of Socrates.
If it be thought that at least the simultaneous co-existence within
one of two selves is inconceivable, I would ask the reader to bear in
mind that the self always includes more than is at any moment given
as actual matter of psychical fact. At any moment the self must be
taken to consist for the most part of unrealised tendencies, and in so
far as such ultimately incompatible tendencies are part of my whole
nature, at the same time it seems reasonable to say that I have
simultaneously more than one self. Ultimately, no doubt, this line of
thought would lead to the conclusion that “my whole nature” itself is
only relatively a whole.
196. Compare the valuable essay by Mr. Bradley on the “Evidence
of Spiritualism” in Fortnightly Review for December 1885.
197. Death, however, though the most striking, is not the only
illustration of this apparently irrational interference of accident with
intelligent purpose. Mental and bodily disablement, or even adverse
external fortune, may have the same effect upon the self. This must
be taken into account in any attempt to deal with the general
problem.
198. Dr. McTaggart’s phrase is more exactly adequate to describe
my view than his own, according to which “immortality” is capable of
philosophical proof. (See the second chapter of his Studies in
Hegelian Cosmology.) I have already explained why I cannot accept
this position. I believe Dr. McTaggart’s satisfaction with it must be
partly due to failure to raise the question what it is that he declares to
be a “fundamental differentiation” of the Absolute.
199. I ought perhaps to say a word—more I do not think necessary
—upon the doctrine that immortality is a fundamental “moral
postulate.” If this statement means no more than that it would be
inconsistent with the rationality of the universe that our work as
moral agents should be simply wasted, and that therefore it must
somehow have its accomplishment whether we see it in our human
society or not, I should certainly agree with the general proposition.
But I cannot see that we know enough of the structure of the
universe to assert that this accomplishment is only possible in the
special form of immortality. To revert to the illustration of the text, (1)
our judgment that the world must be a worthless place without
immortality might be on a level with the child’s notion that “grown-up”
life, to be worth having, must be a life of continual play and no work.
(2) If it is meant, however, that it is not “worth while” to be virtuous
unless you can look forward to remuneration—what Hegel,
according to Heine, called a Trinkgeld—hereafter for not having lived
like a beast, the proposition appears to me a piece of immoral
nonsense which it would be waste of time to discuss.
CHAPTER IV
THE PROBLEM OF MORAL FREEDOM

§ 1. The metaphysical problem of free will has been historically created by extra
ethical difficulties, especially by theological considerations in the early
Christian era, and by the influence of mechanical scientific conceptions in the
modern world. § 2-3. The analysis of our moral experience shows that true
“freedom” means teleological determination. Hence to be “free” and to “will”
are ultimately the same thing. Freedom or “self-determination” is genuine but
limited, and is capable of variations of degree. § 4. Determinism and
Indeterminism both arise from the false assumption that the mechanical
postulate of causal determination by antecedents is an ultimate fact. The
question then arises whether mental events are an exception to the supposed
principle. § 5. Determinism. The determinist arguments stated. § 6. They rest
partly upon the false assumption that mechanical determination is the one and
only principle of rational connection between facts. § 7. Partly upon fallacious
theories of the actual procedure of the mental sciences. Fallacious nature of
the argument that complete knowledge of character and circumstances would
enable us to predict human conduct. The assumed data are such as, from
their own nature, could not be known before the event. § 8. Indeterminism.
The psychical facts to which the indeterminist appeals do not warrant his
conclusion, which is, moreover, metaphysically absurd, as involving the denial
of rational connection. § 9. Both doctrines agree in the initial error of
confounding teleological unity with causal determination.
§ 1. The problem of the meaning and reality of moral freedom is

popularly supposed to be one of the principal issues, if not the
principal issue, of Metaphysics as applied to the facts of human life.
Kant, as the reader will no doubt know, included freedom with
immortality and the existence of God in his list of unprovable but
indispensable “postulates” of Ethics, and the conviction is still
widespread among students of moral philosophy that ethical science
cannot begin its work without some preliminary metaphysical
justification of freedom, as a postulate at least, if not as a proved
truth. For my own part, I own I cannot rate the practical importance
of the metaphysical inquiry into human freedom so high, and am
rather of Professor Sidgwick’s opinion as to its superfluousness in
strictly ethical investigations.[200] At the same time, it is impossible to
pass over the subject without discussion, if only for the excellent
illustrations it affords of the mischief which results from the forcing of
false metaphysical theories upon Ethics, and for the confirmation it
yields of our view as to the postulatory character of the mechanico-
causal scheme of the natural sciences. In discussing freedom from
this point of view as a metaphysical issue, I would have it clearly
understood that there are two important inquiries into which I do not
intend to enter, except perhaps incidentally.
One is the psychological question as to the precise elements into
which a voluntary act may be analysed for the purpose of
psychological description; the other the ethical and juridical problem
as to the limits of moral responsibility. For our present purpose both
these questions may be left on one side. We need neither ask how a
voluntary act is performed—in other words, by what set of symbols it
is best represented in Psychology—nor where in a complicated case
the conditions requisite for accountability, and therefore for freedom
of action, may be pronounced wanting. Our task is the simpler one of
deciding, in the first place, what we mean by the freedom which we
all regard as morally desirable, and next, what general view as to the
nature of existence is implied in the assertion or denial of its
actuality.
That the examination of the metaphysical implications of freedom
is not an indispensable preliminary to ethical study, is fortunately
sufficiently established by the actual history of the moral sciences.
The greatest achievements of Ethics, up to the present time, are
undoubtedly contained in the systems of the great Greek moralists,
Plato and Aristotle. It would not be too much to say that subsequent
ethical speculation has accomplished, in the department of Ethics
proper as distinguished from metaphysical reflection upon the
ontological problems suggested by ethical results, little more than
the development in detail of general principles already recognised
and formulated by these great observers and critics of human life.
Yet the metaphysical problem of freedom, as is well known, is
entirely absent from the Platonic-Aristotelian philosophy. With Plato,
as the reader of the Gorgias and the eighth and ninth books of the
Republic will be aware, freedom means just what it does to the
ordinary plain man, the power to “do what one wills,” and the only
speculative interest taken by the philosopher in the subject is that of
showing that the chief practical obstacle to the attainment of freedom
arises from infirmity and inconsistency in the will itself; that, in fact,
the unfree man is just the criminal or “tyrant” who wills the
incompatible, and, in a less degree, the “democratic” creature of
moods and impulses, who, in popular phrase, “doesn’t know what he
wants” of life.
Similarly, Aristotle, with less of spiritual insight but more attention
to matters of practical detail, discusses the ἑχούσιον, in the third
book of his Ethics, purely from the standpoint of an ideally perfect
jurisprudence. With him the problem is to know for what acts an
ideally perfect system of law could hold a man non-responsible, and
his answer may be said to be that a man is not responsible in case
of (1) physical compulsion, in the strict sense, where his limits are
actually set in motion by some external agent or cause; and (2) of
ignorance of the material circumstances. In both these cases there is
no responsibility, because there has been no real act, the outward
movements of the man’s limbs not corresponding to any purpose of
his own. An act which does translate into physical movement a
purpose of the agent, Aristotle, like practical morality and
jurisprudence, recognises as ipso facto free, without raising any
metaphysical question as to the ontological implications of the
recognition.
Historically, it appears that the metaphysical problem has been
created for us by purely non-ethical considerations. “Freedom of
indifference” was maintained in the ancient world by the Epicureans,
but not on ethical grounds. As readers of the second book of
Lucretius know, they denied the validity of the postulate of rigidly
mechanical causality simply to extricate themselves from the position
into which their arbitrary physical hypotheses had led them. If
mechanical causality were recognised as absolute in the physical
world, and if, again, as Epicurus held, the physical world was
composed of atoms all falling with constant velocities in the same
direction, the system of things, as we know it, could never have
arisen. Hence, rather than give up their initial hypothesis about the
atoms, the Epicureans credited the individual atom with a power of

Textbook Evolving Openmp For Evolving Architectures 14Th International Workshop On Openmp Iwomp 2018 Barcelona Spain September 26 28 2018 Proceedings Bronis R de Supinski Ebook All Chapter PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Textbook Evolving Openmp For Evolving Architectures 14Th International Workshop On Openmp Iwomp 2018 Barcelona Spain September 26 28 2018 Proceedings Bronis R de Supinski Ebook All Chapter PDF

Uploaded by

Copyright:

Available Formats

Evolving OpenMP for Evolving

Architectures 14th International

Graph Drawing and Network Visualization 26th

Security and Trust Management 14th International

Emerging Technologies for Authorization and

Privacy in Statistical Databases UNESCO Chair in Data

Software Architecture 12th European Conference on

PRedictive Intelligence in MEdicine First International

Swarm Intelligence 12th International Conference ANTS

Xavier Martorell Sergi Mateo Bellido

Jesus Labarta (Eds.)

ISSN 0302-9743 ISSN 1611-3349 (electronic)

Library of Congress Control Number: 2018950652

LNCS Sublibrary: SL2 – Programming and Software Engineering

© Springer Nature Switzerland AG 2018

OpenMP is a widely accepted, standard application programming interface (API) for

Design of Tasks Communicating Through MPI” by Joseph Schuchart, Keisuke

September 2018 Bronis R. de Supinski

Program Committee Co-chairs

Joachim Protze RWTH Aachen University, Germany

IWOMP Steering Committee

The Impact of Taskyield on the Design of Tasks Communicating

Loops and OpenMP

OpenMP Loop Scheduling Revisited: Making a Case for More Schedules . . . 21

A Proposal for Loop-Transformation Pragmas. . . . . . . . . . . . . . . . . . . . . . . 37

Extending OpenMP to Facilitate Loop Optimization . . . . . . . . . . . . . . . . . . 53

OpenMP in Heterogeneous Systems

Manage OpenMP GPU Data Environment Under Unified Address Space . . . . 69

OpenMP 4.5 Validation and Verification Suite for Device Offload . . . . . . . . 82

Trade-Off of Offloading to FPGA in OpenMP Task-Based Programming. . . . 96

OpenMP Improvements and Innovations

Compiler Optimizations for OpenMP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Supporting Function Variants in OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . 128

Towards an OpenMP Specification for Critical Real-Time Systems . . . . . . . . 143

OpenMP User Experiences: Applications and Tools

Performance Tuning to Close Ninja Gap for Accelerator Physics Emulation

Visualization of OpenMP* Task Dependencies Using Intel®

A Semantics-Driven Approach to Improving DataRaceBench’s

On the Impact of OpenMP Task Granularity. . . . . . . . . . . . . . . . . . . . . . . . 205

Mapping OpenMP to a Distributed Tasking Runtime . . . . . . . . . . . . . . . . . . 222

Assessing Task-to-Data Affinity in the LLVM OpenMP Runtime . . . . . . . . . 236

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

Joseph Schuchart1(B) , Keisuke Tsugane2 , José Gracia1 , and Mitsuhisa Sato2

Abstract. The OpenMP tasking directives promise to help expose a

Keywords: OpenMP tasks · Task-yield

Task-based programming in OpenMP is gaining traction among developers of

(a) Cyclic block distri- (b) Coarse-grain communication tasks

trsm (0, 2) trsm (0, 4) trsm (0, 1) trsm (0, 3)

send (0, 2) send (0, 4) send (0, 1) send (0, 3)

syrk (2,2) gemm (2, 4) syrk (4,4)

recv (0, 3) recv (0, 1) recv (0, 2) recv (0, 4)

(c) Fine-grain communication tasks

ﬁne-grained synchronization using MPI two-sided communication, the user may

In this paper, we discuss possible implementations of taskyield in Sect. 2. We

2 Potential Implementations of Taskyield

As mentioned above, the OpenMP standard leaves a high degree of freedom

No-op. Figure 2a depicts the simplest possible implementation: ignoring the

Stack-Based. In Fig. 2b, a stack-based implementation of taskyield is depicted.

(a) No-op (b) Stack-based

(c) Cyclic (d) N-Cyclic (N=1)

Fig. 2. Possible implementations of task-yield (single thread, three tasks).

Cyclic. An implementation supporting cyclic taskyield has the ability to

N-cyclic. The N-cyclic taskyield depicted in Fig. 2d is a generalization of the