next up previous
Next: Abstract models for hierarchical Up: Hierarchical Parallel Computation Models Previous: Hierarchical Parallel Computation Models


Background: Parallelism exploitation on hierarchical architectures

We probably need to explain or refer (within the book) the concepts of process and thread, the difference among multiprogramming and parallel programming, and the basic measures of parallel performance (speedup, scalability).

Modern processors used in parallel machines all use a deep memory hierarchy. I assume that the details and consequences of hardware CPU caches, the policies of operating systems disk caches, the design of external memory algorithms and their basic models (e.g. PDM) have all been explained in other parts of the book.

Parallel computation avoids sequential architecture bottlenecks at several different levels. Some examples of these limits are processing speed (which nowadays is bounded essentially by memory bandwidth), maximum feasible size of main memory, peak I/O bandwidth.

From the point of view of parallelism exploitation, we have two basic paradigms that lead to different costs for communication and to different ways of designing algorithms: shared-memory programming and message passing. To make a quick comparison, we can distinguish some advantages and shortcomings. A good reference here is the first chapter of [1].

Shared memory programming is based on hardware support to provide access to the same memory space to every CPU in the system.

$+$
It is simpler at first, and we can use a large number of PRAM algorithms. Communication among threads or processes can be as fast as access to memory. Threads may be used to reduce the amount of system overhead for parallelism exploitation.
$-$
Programmers take care of avoiding race conditions on variables. The PRAM constant access cost is not realistic. Hardware support is not simple and uses CPU local caches, thus making the MH is more complex. False sharing and performance problems may impose complex constraints on data access and placement in memory.

Message-passing parallel programming assumes that each processor works in its own local memory. Programs contain explicit communications to exchange data and synchronize operations among different processing nodes.

$+$
Message passing more closely mimics the hardware of large, distributed-memory parallel systems. It leads to easier implementation of the support and a more understandable performance behavior. It is also easier for the programmer to overlay computation and communication.
$-$
In the general case it is a complex problem to partition large data structures to distribute the computation in parallel while minimizing the amount of communications.


next up previous
Next: Abstract models for hierarchical Up: Hierarchical Parallel Computation Models Previous: Hierarchical Parallel Computation Models
Massimo Coppola 2002-02-08