We probably need to explain or refer (within the book) the concepts of process and thread, the difference among multiprogramming and parallel programming, and the basic measures of parallel performance (speedup, scalability).
Modern processors used in parallel machines all use a deep memory hierarchy. I assume that the details and consequences of hardware CPU caches, the policies of operating systems disk caches, the design of external memory algorithms and their basic models (e.g. PDM) have all been explained in other parts of the book.
Parallel computation avoids sequential architecture bottlenecks at several different levels. Some examples of these limits are processing speed (which nowadays is bounded essentially by memory bandwidth), maximum feasible size of main memory, peak I/O bandwidth.
From the point of view of parallelism exploitation, we have two basic paradigms that lead to different costs for communication and to different ways of designing algorithms: shared-memory programming and message passing. To make a quick comparison, we can distinguish some advantages and shortcomings. A good reference here is the first chapter of [1].
Shared memory programming is based on hardware support to provide access to the same memory space to every CPU in the system.
Message-passing parallel programming assumes that each processor works in its own local memory. Programs contain explicit communications to exchange data and synchronize operations among different processing nodes.